Mismatch of CUDA capabilities version for RTX A4000

antun · March 11, 2025, 5:24pm

Hi all, I’ve been troubleshooting an issue with our RTX A4000. We were relying on the output of cudaGetDeviceProperties() to get the CUDA capability version that the GPU supported. Here’s some code I used to get the CUDA capability version:

detect.cu
#include <cstdio>
#include <cuda_runtime.h>

void printDeviceProperties(const cudaDeviceProp& prop) {
    printf("Device Name: %s\n", prop.name);
    printf("Compute Capability: %d.%d\n", prop.major, prop.minor);
}

int main() {
    int count = 0;
    if (cudaSuccess != cudaGetDeviceCount(&count)) return -1;
    if (count == 0) return -1;

    for (int device = 0; device < count; ++device) {
        cudaDeviceProp prop;
        if (cudaSuccess == cudaGetDeviceProperties(&prop, device)) {
            printf("Device %d Properties:\n", device);
            printDeviceProperties(prop);
        }
    }

    return 0;
}

If I run that with either CUDA 11.4 or 12.8, I get:

Device 0 Properties:
Device Name: NVIDIA RTX 4000 Ada Generation
Compute Capability: 8.9

I’m running that with either:

/usr/local/cuda-12.8/bin/nvcc --run detect.cu
/usr/local/cuda-11.4/bin/nvcc --run detect.cu

The problem is that this GPU does NOT support compute capability 8.9. I only discovered this after checking the table here: CUDA GPUs - Compute Capability | NVIDIA Developer

It actually only supports compute capability 8.6. The table is definitely correct. I know because I was compiling with capability 8.7 (because another GPU we have supported 8.7), but it wouldn’t run on the RTX A4000. As soon as I switched it to 8.6, it worked just fine.

So my question is: Is there a reliable way to determine the CUDA capability version for a GPU programatically? Obviously relying on cudaGetDeviceProperties() isn’t reliable in this case.

Thanks!

striker159 · March 11, 2025, 5:35pm

The RTX 4000 card as reported by the cuda api does have compute capability 8.9.
The RTX A4000 has compute capability 8.6.

Now the question is whether the card which you are using is indeed an RTX 4000 or an RTX A4000. If there is a mismatch between the card and the api output, I would suggest filing a bug.

antun · March 11, 2025, 5:51pm

Can you tell me where you got those values from exactly? The only RTX 4000 I see listed on the capabilities page is the Quadro RTX 4000, but that has a compute capability of 7.5:

The GPU itself has Model: PG190A on the back, and if I search for that in Google, it returns the RTX A4000 page: RTX A4000 Graphics Card | NVIDIA (Although I don’t see anywhere on that page that it lists the model number.)

striker159 · March 11, 2025, 6:01pm

Sorry I did not copy the full name.
I am talking about this card NVIDIA RTX 4000 Ada Generation NVIDIA RTX 4000 Ada Generation Graphics Card

rs277 · March 11, 2025, 6:05pm

The model number is listed as “PG190 SKU 510” here.

striker159 · March 11, 2025, 6:05pm

This ebay offer suggests that PG190A is indeed the RTX 4000 Ada GPU.

antun · March 11, 2025, 6:09pm

Thanks, is there an official location where the compute capability is listed? The datasheet doesn’t mention it, and the Nvidia page on cuda compute capability doesn’t explicitly list the RTX A4000 Ada Generation.

It definitely didn’t work when I compiled with 8.7 (which is below 8.9). But it worked fine with 8.6.

striker159 · March 11, 2025, 6:15pm

I cannot find it either. However, the Ada Lovelace architecture has CC 8.9.
CC 8.6 and 8.7 belong to the Ampere architecture.

rs277 · March 11, 2025, 6:27pm

Another method to ID it, would be to run :

lspci -vv

and compare the device ID, to the list at the bottom of the Nvidia Open driver page here.

antun · March 11, 2025, 6:33pm

Thanks. It looks like it is the RTX A4000 Ada Generation. Here’s the device info from lspci --vv for that device:

0000:01:00.0 VGA compatible controller: NVIDIA Corporation Device 27b2 (rev a1) (prog-if 00 [VGA controller])
        Subsystem: NVIDIA Corporation Device 181b
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 427
        NUMA node: 0
        Region 0: Memory at 70000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at 3c0000000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at 3c0010000000 (64-bit, prefetchable) [size=32M]
        Region 5: I/O ports at <unassigned> [disabled]
        Expansion ROM at 71000000 [virtual] [disabled] [size=512K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

On that Github page, 27b2 is listed as RTX 4000 Ada Generation:

Robert_Crovella · March 11, 2025, 8:44pm

There are at least 3 different GPUs with similar-sounding names.

RTX 4000 - Turing generation - cc7.5
RTX A4000 - Ampere Generation - cc8.6
RTX 4000 Ada Generation - Ada Generation - cc8.9

Referring to something as “RTX A4000 Ada Generation” is not correct and is not part of NVIDIA naming convention; there is no such animal.

Your GPU in the device property output reports itself as RTX 4000 Ada Generation, and it reports cc8.9:

That is consistent. Notice there is no “A4000” being reported there. So looking at a table entry for “A4000” is not correct and not relevant, for that GPU.

antun · March 11, 2025, 9:25pm

Ah, thanks for catching that. It’s kind of confusing with the very similar names, and I inadvertently typed the A! Understood that we have the RTX 4000 Ada Generation.

I’m still not sure where the definitive place is that I can look-up the model number (on the back of the unit), the marketing name, and the CUDA compute capability version.

Also, I guess I still don’t understand how the CUDA compute capability version works. I assumed that it was backwards-compatible, so if I compiled to 8.7, it should work for the Nvidia RTX 4000 Ada Generation (because that GPU supports 8.9). But it didn’t work - it only worked when I compiled with 8.6. (Haven’t tried 8.9 yet.)

rs277 · March 11, 2025, 9:36pm

Correct, as a general rule and have PTX present, but I’m not sure in this particular case.

CC8.7 is specific to a range of Jetson devices, which are quite a different beast to a standalone GPU.

antun · March 11, 2025, 9:45pm

Ah, that’s really helpful! The reason I chose CC8.7 was that the other GPU I was testing with was a Jetson. So I thought by compiling with 8.7 (which was less than the 8.9 that the Nvidia RTX 4000 Ada Generation supported) it would work on both. And it did work on the Jetson, just not on the RTX 4000.

Curefab · March 11, 2025, 9:47pm

As @rs277 stated, the binary needs PTX included (so the graphics driver can compile PTX on-the-fly for your GPU). It depends, whether you specify a virtual or a real architecture (compute_xx or sm_xx) and on the nvcc options.

Robert_Crovella · March 11, 2025, 10:00pm

The marketing name and compute capability version are in the device property output. And the table you linked already is the best source I know of for online search. I acknowledge it is not perfect.

The PG190 number if that is what you are referring to as the model number is not something that is reliably publicly documented by NVIDIA. It’s not intended by NVIDIA to be a way to refer to the unit, at least not in a public setting. Therefore I would suggest there is no “definitive” source for it. The internet collections covering it are ad-hoc, to my knowledge.

antun · March 11, 2025, 10:15pm

Thanks - I’m not familiar with PTX; I’ll have to do a bit of research on it. In general, do you think it’ll be possible for me to compile a single binary that uses CC8.7 so that it can be run on both GPUs?

njuffa · March 11, 2025, 10:35pm

The standard CUDA approach to the lack of binary compatibility between GPU architectures is to create a fat binary that includes binary images for all GPU architectures one wishes to support plus PTX for the most recent GPU architecture to provide some degree of forward compatibility with future architectures through JIT compilation.

Topic		Replies	Views
Compute Capability for RTX 4000 Ada Generation CUDA Programming and Performance	3	429	November 15, 2024
CUDA Compatibility between NVIDIA RTX A5000 and GeForce RTX 4060 Ti CUDA Setup and Installation	5	21654	August 25, 2023
Compute Capability support in desktop NVIDIA RTX A2000 CUDA Programming and Performance	6	9071	January 21, 2022
GT400 Compute capability CUDA Programming and Performance	4	11049	April 6, 2011
CUDA compatibility with RTX 4070 CUDA Setup and Installation	1	5279	March 31, 2024
CUDA CMAKE RTX 3070 wrong compute capability CUDA Programming and Performance cuda	2	496	February 28, 2024
Determining correct compute capability for a loaded PTX file/kernel ? CUDA Programming and Performance	10	2763	February 11, 2015
Compute Capability on 4060TI CUDA Setup and Installation	1	2690	February 6, 2024
RTX 2080Ti and CUDA version Frameworks (archived) pytorch	1	26967	October 26, 2019
GT 4xx Fermi cards = Compute Capability 1.0 ?! This is what the nVidia page says. Mistake? CUDA Programming and Performance	9	4782	July 25, 2011

Mismatch of CUDA capabilities version for RTX A4000

Related topics