What compute capacity is for RTX40 series?

sakalaboator · May 31, 2023, 1:12am

Capacity of 30series is 8.6 or 8.9. But I cannot find 40s capacity value in website, now I need to compile tensorflow with CUDA12 and RTX40 series GPU, what capacity should I choose? 8.9 or more?

Robert_Crovella · May 31, 2023, 1:51am

RTX 40 series GPUs are compute capability 8.9

sakalaboator · May 31, 2023, 4:38pm

I know what you mean. I also catch the value from former tensorflow version. But I am very confused that 40s/8.9 vs 30s/8.6. The improvement is so tiny and impossible. 10-6.1 20-7.5 30-8.6, I think 40s is more then 9.6 at least.

sakalaboator · May 31, 2023, 4:41pm

Maybe the 8.9 is the highest limit tensorflow can support before 40s.

Robert_Crovella · May 31, 2023, 4:43pm

It’s OK if you don’t believe me.

Try running the deviceQuery CUDA sample code on your 40 series GPU.

Good luck!

njuffa · May 31, 2023, 4:51pm

The increments between compute capabilities are not of a prescribed fixed size. The only requirement is that a new GPU architecture is assigned a higher number than any previous architecture. For example, the Ada Lovelave architecture could have been assigned the number 8.8 instead of 8.9 if NVIDIA had wanted that instead. A useful overview of architectures and associated GPUs can be found in the Wikipedia article on CUDA.

NVIDIA appears to be in the habit of assigning a new major version number (integer) to each “major” architecture, and assigning some minor number (fraction) to each architecture derived from such a major architecture. What NVIDIA considers a “major” architecture, and how it picks the numbers to enumerate the derived architectures is a detail we are not privy to and that does not matter for CUDA programmers.

varelse2005 · June 1, 2023, 2:13am

8.9 ~=9 #amIRight? FP4Life!

rs277 · June 1, 2023, 2:23am

Tables 14 and 15 here outline some quite significant differences between the two.

varelse2005 · June 1, 2023, 2:49am

Depends what you mean by significant. I don’t do a lot with tensor cores and I’m still waiting on 64-bit signed integer atomic adds 17 years into this. I can trick it with:

atomicAdd((unsigned long long int*)pInt64, llitoulli(a));

Where:
static device inline unsigned long long int llitoulli(long long int l)
{
unsigned long long int u;
asm(“mov.b64 %0, %1;” : “=l”(u) : “l”(l));
return u;
}

but they did this with Volta and Turing, and now with Ampere and Ada. For me, it’s just another architecture to tweak with L1/SMEM utilization and grid dimensions. Before that, Celsius to Fermi to Kepler to Maxwell to Pascal to Volta was unambiguous.

rs277 · June 1, 2023, 2:59am

Over double the amount of shared memory/block on 9.0 I’d find pretty useful.

varelse2005 · June 1, 2023, 1:11pm

Except that shared memory is a crippled shadow of what it once was pre 7.x and all my high performance code now relies almost entirely on the register file and synchronous warp collectives, relegating SMEM to L1 instead. 6 years in, I have yet to see the benefit of race conditions within warps in real code. To me, H100’s value proposition is distributed computing in a single or a cluster of DGX servers rather than minor differences in CUDA device properties and a few bespoke instructions.

sakalaboator · June 3, 2023, 7:32am

I believe you for sure, and I have got 8.9 from tensorflow device list. Now I have compiled tensorflow 2.12 with cuda12 and cudnn8.8. Thanks!

sakalaboator · June 3, 2023, 7:34am

Thanks, Deep comprehension to the compute capacity!

Topic		Replies	Views
What compute capacity is for RTX40 series? CUDA Setup and Installation	0	317	May 31, 2023
Mismatch of CUDA capabilities version for RTX A4000 CUDA Programming and Performance	17	265	March 11, 2025
Cuda compute capability identified incorrectly, reading 3.5 as 2.1 ? CUDA Setup and Installation	6	2523	May 4, 2019
RTX 6000 Ada slower than GeForce RTX 3050 in Python with TensorFlow 2? CUDA Programming and Performance	2	854	September 2, 2023
CUDA Compatibility between NVIDIA RTX A5000 and GeForce RTX 4060 Ti CUDA Setup and Installation	5	17660	August 25, 2023
Compatibility problems CUDA Setup and Installation cuda , python , cudnn	2	45	May 5, 2025
Tensorflow and CUDA 12 CUDA Setup and Installation	6	43943	January 3, 2023
I think the 4090 is not performing properly CUDA Programming and Performance cuda , tensorflow , python	1	1109	March 11, 2023
Is 940MX GPU has Compute Capability upper than 3 ? CUDA Setup and Installation	3	6110	August 24, 2017
understanding the CUDA CUDA Toolkit Number and the CUDA toolkit version CUDA Setup and Installation	5	2636	September 26, 2016

What compute capacity is for RTX40 series?

Related topics