Check the ROP unit count under Linux? Affects all RTX 50XX cards

Hi,

is there a way to get the actual number of ROP units on a GPU using Linux? I am asking due to the reports that some 50xx GPUs have been dispatched with a reduced number of ROPs. And this morning the first 5080 with reduced ROP has been discovered.

nvidia have already acknowledged the problem and that it was ground for replacement.

However, this can only be detected using GPU-Z application under Windows. Is there a way to get the info under Linux? I tried with nvidia-smi -q but it does not show that number it seems. Thanks a lot for any suggestion that does not involve installing Windows.

4 Likes

I have been looking into this too. Have you found anything yet?

I’m in the Cuda world, which doesn’t really touch graphics capability. I did wonder whether Nsight Graphics, the profiler lists those stats somewhere, I haven’t looked.

Even if it did, it could of course use a lookup table for the specs, rather than the actual number.

1 Like

So far the only suggestion I had was to use GPU-Z on a portable USB key with Windows installed. Best way would be that nvidia-smi reported the same information.

It just happens that the first case of missing ROPs on a 5080 has been discovered. The other way would be to verify how this impacts the graphical benchmarks available on Linux, such as Unigine series. I will have a look at Phoronix Openbenmarking.org site.

Thank you for the tip. Worth a try I suppose. But you are right, there is also the risk they display the info from a database, not from the driver. And at this stage we do not even know if this info is even built in the Linux driver itself.

I’ve just grepped the entire driver for ROP and the Raster.Output.Pipeline regex and found nothing.

nv-kernel.o_binary contains a single instance of ROP but it looks like it’s accidental.

Directly it’s seemingly neither retrieved, nor exported. Linux is again castrated and limped.

Like it was said earlier, you could install Windows on a flash drive, boot from it, and run GPU-Z. I see no other options unless you reverse engineer the GSP firmware to find out the API call to fetch this info. AFAIK, the firmware is the same between Linux and Windows.

Thank you very much for the feedback. I also grepped vulkaninfo outputs from 5080 and 5090 (found them on openbenchmarking.org) and I see no differences, apart from the device ID and memory heap sizes (32GB for the 5090 and 16GB for the 5080, as expected). So it does not seem to provide any useful info either in that regard :/ Unless nvidia engineers help with some API call, Windows on a portable drive seems to be the solution.

ChatGPT continues to be stupid as hell.

It knows everything, but far too often you have to nudge it towards the right answer.

So the information should be there, but God knows how to retrieve it. Even on Windows, nvidia-smi is woefully incomplete. No idea how W1zzard from TechPowerUp managed to find the hidden API calls to extract it. Perhaps he could port GPU-Z to Linux, but given the number of Linux users and nothing like Win32, I suspect he will be reluctant to do so.

We could maybe ask GPU-Z authors which API call they did to retrieve that info. I suppose that nvidia-smi also makes call on the driver public API?

I’m 99.99% sure those calls are under NDA or/and are a trade secret otherwise NVIDIA would have exported this info themselves in nvidia-smi.

GPU-Z would have access to those?

Otherwise, maybe in NVAPI? NVAPI Reference Documentation
There are a few “Raster” entries in the API, but it is not clear if they return the ROP number.

Or maybe in the CUDA driver API?

So I asked directly in GPU-Z forum. The response from the dev was: “If it’s not listed in the public docs it’s under NDA. Only way is to reach out to NVIDIA for NDA access to NVAPI”.
He has not answered, if he has the NDA access, but I suppose so. At least we know he uses the NVAPI.

@aplattner

Aaron, is there a specific reason why these NVAPI calls are under NDA?

Logically I cannot understand it why NVIDIA would hide them. We already have GPU-Z that uses them, you cannot call them unless the card is already supported by the driver. This information is not something worth hiding. It’s just confusing. I understand your stance towards Hot Spot temperature (could be something you never intended to export) or VRAM temp (AFAIK it has different ways to access it and they are all proprietary).

Can’t believe you asked ChatGPT lol.

If you anything useful to contribute in regards to the original topic, please do not hesitate to share. Otherwise, how about you GTFO?

2 Likes

This may be a stupid idea (I have no clue), but would it be possible to run GPU-Z with Wine on Linux?

Actually GPU-Z works under Wine (GPU-Z 2.62/Wine 10.0) but I cannot vouch for the correctness of the displayed data.

AFAIK W1zzard has claimed somewhere that when the respective NVAPI calls fail, GPU-Z uses the internal data instead which means you cannot trust its output under Linux/Wine. Wine must be able to route low-level NVAPI calls to the underlying NVIDIA Linux libraries and I’m far from certain it actually does that. I’m almost sure it does not.

As you can see CUDA, DirectCompute, DirectML, Ray Tracing and PhysX are all missing even though CUDA and Ray Tracing are perfectly supported under Linux.

Even the driver version is incorrect. It displays 536.25, I have 565.77 installed.

1 Like

No on Reddit some other users have reported it does not work under Wine. Worse, as birdie explained, GPU-Z falls back to an internal database when the driver does not report a value, leading to misleading data. Hence, under Windows, the driver has to be installed to show the missing ROP number. Otherwise it would show the normal number even on a defective card.

I will try to build a Live Windows on USB stick with Ventoy (as explained here: How to Run Windows From a USB Drive | PCMag) and report back if I am successful.

Again, having help from nvidia would be good there. I looked at the NVAPI, and it is not clear what each function returns: you get handles and some data, so maybe the data is there without even an NDA, but not explicitely described.

2 Likes

Hey, I’m that guy on Reddit and I can confirm it was populating the wrong data. It was showing the wrong version number and said I had more ROPs than I should. I’m hoping someone can get the live cd working.

March first when the new ISO is released I’ll be doing a system wipe and reinstalling Linux. Before I reinstall Linux I’m going to sacrifice the virginity of one of my drives annd install windows to check the value. If I have the correct amount maybe we can find a benchmark test to run and compare values.

I did see something about an nvidia debug tool but I haven’t had time to look into it.

1 Like

Hey thanks for the feedback.

I have seen some tutos on how to create live Windows on USB drives, will attempt this from today. If others are interested, these are the ones I saw:

The 2 first are using the same way, this is what I will try first. Note although you create a VHD, the OS runs bare metal, not on a VM. This is important again for GPU-Z that does not work in VM either.

The key thing being a live Windows OS from the stick, not an installation image. That way you do not need to feedle with bootloader and partitions.