Mismatch of CUDA capabilities version for RTX A4000

Hi all, I’ve been troubleshooting an issue with our RTX A4000. We were relying on the output of cudaGetDeviceProperties() to get the CUDA capability version that the GPU supported. Here’s some code I used to get the CUDA capability version:

detect.cu
#include <cstdio>
#include <cuda_runtime.h>

void printDeviceProperties(const cudaDeviceProp& prop) {
    printf("Device Name: %s\n", prop.name);
    printf("Compute Capability: %d.%d\n", prop.major, prop.minor);
}

int main() {
    int count = 0;
    if (cudaSuccess != cudaGetDeviceCount(&count)) return -1;
    if (count == 0) return -1;

    for (int device = 0; device < count; ++device) {
        cudaDeviceProp prop;
        if (cudaSuccess == cudaGetDeviceProperties(&prop, device)) {
            printf("Device %d Properties:\n", device);
            printDeviceProperties(prop);
        }
    }

    return 0;
}

If I run that with either CUDA 11.4 or 12.8, I get:

Device 0 Properties:
Device Name: NVIDIA RTX 4000 Ada Generation
Compute Capability: 8.9

I’m running that with either:

/usr/local/cuda-12.8/bin/nvcc --run detect.cu
/usr/local/cuda-11.4/bin/nvcc --run detect.cu

The problem is that this GPU does NOT support compute capability 8.9. I only discovered this after checking the table here: CUDA GPUs - Compute Capability | NVIDIA Developer

It actually only supports compute capability 8.6. The table is definitely correct. I know because I was compiling with capability 8.7 (because another GPU we have supported 8.7), but it wouldn’t run on the RTX A4000. As soon as I switched it to 8.6, it worked just fine.

So my question is: Is there a reliable way to determine the CUDA capability version for a GPU programatically? Obviously relying on cudaGetDeviceProperties() isn’t reliable in this case.

Thanks!

The RTX 4000 card as reported by the cuda api does have compute capability 8.9.
The RTX A4000 has compute capability 8.6.

Now the question is whether the card which you are using is indeed an RTX 4000 or an RTX A4000. If there is a mismatch between the card and the api output, I would suggest filing a bug.

Can you tell me where you got those values from exactly? The only RTX 4000 I see listed on the capabilities page is the Quadro RTX 4000, but that has a compute capability of 7.5:

The GPU itself has Model: PG190A on the back, and if I search for that in Google, it returns the RTX A4000 page: RTX A4000 Graphics Card | NVIDIA (Although I don’t see anywhere on that page that it lists the model number.)

Sorry I did not copy the full name.
I am talking about this card NVIDIA RTX 4000 Ada Generation NVIDIA RTX 4000 Ada Generation Graphics Card

The model number is listed as “PG190 SKU 510” here.

This ebay offer suggests that PG190A is indeed the RTX 4000 Ada GPU.

Thanks, is there an official location where the compute capability is listed? The datasheet doesn’t mention it, and the Nvidia page on cuda compute capability doesn’t explicitly list the RTX A4000 Ada Generation.

It definitely didn’t work when I compiled with 8.7 (which is below 8.9). But it worked fine with 8.6.

I cannot find it either. However, the Ada Lovelace architecture has CC 8.9.
CC 8.6 and 8.7 belong to the Ampere architecture.

Another method to ID it, would be to run :

lspci -vv

and compare the device ID, to the list at the bottom of the Nvidia Open driver page here.

Thanks. It looks like it is the RTX A4000 Ada Generation. Here’s the device info from lspci --vv for that device:

0000:01:00.0 VGA compatible controller: NVIDIA Corporation Device 27b2 (rev a1) (prog-if 00 [VGA controller])
        Subsystem: NVIDIA Corporation Device 181b
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 427
        NUMA node: 0
        Region 0: Memory at 70000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at 3c0000000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at 3c0010000000 (64-bit, prefetchable) [size=32M]
        Region 5: I/O ports at <unassigned> [disabled]
        Expansion ROM at 71000000 [virtual] [disabled] [size=512K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

On that Github page, 27b2 is listed as RTX 4000 Ada Generation:

There are at least 3 different GPUs with similar-sounding names.

RTX 4000 - Turing generation - cc7.5
RTX A4000 - Ampere Generation - cc8.6
RTX 4000 Ada Generation - Ada Generation - cc8.9

Referring to something as “RTX A4000 Ada Generation” is not correct and is not part of NVIDIA naming convention; there is no such animal.

Your GPU in the device property output reports itself as RTX 4000 Ada Generation, and it reports cc8.9:

That is consistent. Notice there is no “A4000” being reported there. So looking at a table entry for “A4000” is not correct and not relevant, for that GPU.

Ah, thanks for catching that. It’s kind of confusing with the very similar names, and I inadvertently typed the A! Understood that we have the RTX 4000 Ada Generation.

I’m still not sure where the definitive place is that I can look-up the model number (on the back of the unit), the marketing name, and the CUDA compute capability version.

Also, I guess I still don’t understand how the CUDA compute capability version works. I assumed that it was backwards-compatible, so if I compiled to 8.7, it should work for the Nvidia RTX 4000 Ada Generation (because that GPU supports 8.9). But it didn’t work - it only worked when I compiled with 8.6. (Haven’t tried 8.9 yet.)

Correct, as a general rule and have PTX present, but I’m not sure in this particular case.

CC8.7 is specific to a range of Jetson devices, which are quite a different beast to a standalone GPU.

2 Likes

Ah, that’s really helpful! The reason I chose CC8.7 was that the other GPU I was testing with was a Jetson. So I thought by compiling with 8.7 (which was less than the 8.9 that the Nvidia RTX 4000 Ada Generation supported) it would work on both. And it did work on the Jetson, just not on the RTX 4000.

As @rs277 stated, the binary needs PTX included (so the graphics driver can compile PTX on-the-fly for your GPU). It depends, whether you specify a virtual or a real architecture (compute_xx or sm_xx) and on the nvcc options.

1 Like

The marketing name and compute capability version are in the device property output. And the table you linked already is the best source I know of for online search. I acknowledge it is not perfect.

The PG190 number if that is what you are referring to as the model number is not something that is reliably publicly documented by NVIDIA. It’s not intended by NVIDIA to be a way to refer to the unit, at least not in a public setting. Therefore I would suggest there is no “definitive” source for it. The internet collections covering it are ad-hoc, to my knowledge.

1 Like

Thanks - I’m not familiar with PTX; I’ll have to do a bit of research on it. In general, do you think it’ll be possible for me to compile a single binary that uses CC8.7 so that it can be run on both GPUs?

The standard CUDA approach to the lack of binary compatibility between GPU architectures is to create a fat binary that includes binary images for all GPU architectures one wishes to support plus PTX for the most recent GPU architecture to provide some degree of forward compatibility with future architectures through JIT compilation.

1 Like