RTX 6000 Ada slower than GeForce RTX 3050 in Python with TensorFlow 2?

cs.science · September 2, 2023, 7:02am

Hi everyone,

I am experiencing a noticeable difference in performance comparing our old vs new setup.

I am running a biological analysis (with the mRNA trajectory inference tool UniTVelo) that utilizes TensorFlow 2 for GPU acceleration, within a mambaorg/micromamba:jammy-cuda-11.8.0 docker image, the same on both systems for comparison.

Our old system had an NVIDIA GeForce RTX 3050, the new system runs on an NVIDIA RTX 6000 Ada. The old system takes 30 min for a complete run-through of the analysis whereas the new system needs 42 min.

Is this expected? Or is this somehow due to the fact that Tensorflow2 is not yet supporting CUDA 12? Hence the CUDA 11.8 docker image.


+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX 6000 Ada Gene...    On  | 00000000:01:00.0 Off |                    0 |
| 35%   65C    P2              94W / 300W |  44110MiB / 46068MiB |     17%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
| 30%   55C    P2    67W / 130W |   6738MiB /  8192MiB |     85%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Kind regards,
Chris

njuffa · September 2, 2023, 7:26am

This GPU has compute capability 8.9 and CUDA 11.8 should support it. However, you would want to find out whether the CUDA-accelerated software you are using includes SASS (machine code) for sm_89 in the fat binary, otherwise there will be overhead for JIT compilation. You may find out that you must build some or all of the software from source code to include sm_89 support.

Comparing the performance of two systems only works well if exactly one variable changes between them, in other words, when we are conducting a well-controlled experiment. Are the two systems in question identically configured (same hardware, same software with same versions, identical configuration files and settings except those directly related to the GPU), except that the GPUs differ? If not you will probably need to do some system-level profiling followed by component-level profiling to see where exactly the performance difference comes from.

I do not know anything about your software stack, but I would assume that its performance does not solely rely on the GPU, but also on other system components, such as the CPU (cores and clock frequencies), system memory (size and speed), mass storage (speed grade of NVMe) even interconnects (version and width of PCIe slots).

Were these system purchased from an experienced (and NVIDIA approved) system integrator, or are they self-configured? If the latter, are you confident that the GPUs are in the correct PCIe slots, and that their power and cooling needs are being met?

If your machines are actually rented by the hour, which cloud provider are you using and which instances, and what is the virtualization software being used?

Is this expected?

In a system where nothing is changed other than that a GTX 3050 is replaced with an RTX 6000 Ada, GPU-accelerated software should see a very significant performance increase, as the latter GPU offers (from memory!) something like 2x the GPU memory bandwidth and 4x the computational throughput of the former GPU, also a much larger GPU memory which is a great performance benefit to many HPC applications.

cs.science · September 2, 2023, 7:54am

Thank you very much for the swift response and helpful pointers! I will give them some thought and run more tests.

Topic		Replies	Views
Rtx 3050 desktop cuda compatibility CUDA Setup and Installation	10	4611	May 2, 2024
Tensorflow1.14 is not working on RTX3090 inside the Docker container of Ubuntu18.04 and CUDA10.0 with Python2 CUDA Programming and Performance cuda , tensorflow , ubuntu , docker	11	5503	April 2, 2022
I'm novice, please help -- pure performance CUDA Programming and Performance	17	60	October 30, 2024
RTX 3070 with CUDA10.0 compatibility [UbuntuOS, any version] Linux	15	11517	February 25, 2021
How to achieve 56 TFLOPS performance on RTX 500 Ada? CUDA Programming and Performance cuda	11	129	April 20, 2025
Is RTX 4000 ADA CUDA capable? CUDA Programming and Performance	7	1239	December 24, 2024
Strange performance regression with a single GPU context on a multi GPU host CUDA Programming and Performance	11	957	April 7, 2021
Unexpected Performance Discrepancy Between RTX A6000 and RTX 3090 GPU - Hardware gpu	1	215	September 20, 2024
Cuda and tensorflow CUDA Developer Tools	0	1131	September 18, 2020
Cuda performance - Parallel computing CUDA Programming and Performance	8	787	October 26, 2022

RTX 6000 Ada slower than GeForce RTX 3050 in Python with TensorFlow 2?

Related topics