Process speed difference between 537.58 vs 472.12

This is an inquiry about GPU processing time.


GPU processing time is up to 70ms slower in driver version 537.58 compared to driver version 472.12.

I want to understand the cause.

Device used: NVIDIA RTX A4000

Drivers used: NVIDIA RTX/Quardo Desktop and notebook Driver Release R470 U4(472.12), R535 U7(537.58)

Details: I found that the above tact changes depending on the driver version.

The difference between the two takts was due to the calculation speed when the GPU started up.

Further investigation revealed that in version 472.12, Memory-Usage was 183MB even when there were no GPU processes, and it was constantly idling.

On the other hand, in version 537.58, Memory-Usage was 0MB, and it behaved as if it started only after a process arrived.

This difference seems to be caused by the driver, but could you please tell me more about the cause of this behavior?

A GPU that is in WDDM mode pretty much cannot have 0 memory usage. A GPU that is in TCC mode generally will not have much if any memory usage when a compute process is not running on it.

So your description appears to me to fit the description of a GPU being in TCC mode in the case of 537.58, and WDDM mode in the case of 472.12.

If that were the case, it would result in a variety of differences in behavior, and one of those differences may be in the area of application start-up and time costs to “wake up” the GPU.

Hello, Robert.
Thank you for answering.

I have already tried both WDDM mode and TCC mode.
When running with driver 537.58 and WDDM mode, Memory-Usage is always 0MB.
When running with driver 472.12 and WDDM mode, Memory-Usage is always 183MB.
When set to TCC mode, Memory-Usage was 0MB for both driver versions.

I think there is a causal relationship with the driver version, but what do you think?

I’ve personally never seen that situation, ever, where you have a WDDM GPU and the memory usage is 0. I don’t have any further comments. There may be something new in the newest drivers that I am unaware of. There are definitely some new driver architectural developments, lately. 1 (linux only:)2

When I started Nvidia-smi.exe, Memory-Usage became 0MiB like this.
Is there anything I can learn from this or do you have any ideas for investigating?

0MiB / 16376MiB looks wrong for a GPU with WDDM driver. You are running on bare metal and not on top of some virtualization software, correct?

Thank you for answering.
The GPU is running on bare metal.
The OS is Windows 10 IoT Enterprise LTSC2021.

If this is not the latest driver package for your platform, I would suggest installing the latest. nvidia-smi is part of the driver package, so driver and utility program should always be in sync.

If this is the latest driver package, you could consider filing a bug with NVIDIA, but it is not clear to me that a potential underreporting of GPU memory currently in use from nvidia-smi has any practical impact on using the GPU.

I am using the latest driver package 537.58 provided by NVIDIA.

The contents of the GPU test and the phenomenon that I am worried about are as follows.
-GPU test-
1: GPU Sleep 10 seconds
2: GPU calculation 50 times (A⇒B⇒C)
3: GPU Sleep 10 seconds
4: GPU calculation 50 times (A⇒B⇒C)

-GPU calculation-
A: Secure 262MB of data area on the CPU side (the secured area will be used from the second time onwards)
B: Secure 262MB data area on the GPU side (use the secured area from the second time onwards)
C: Copy from GPU to CPU

When copying data from GPU to CPU, if you start copying when Memory-Usage is 0MiB, the transfer rate immediately after starting the transfer will be significantly slow.
(If the transfer continues for a while, the transfer rate will gradually become faster.)
After that, when the transfer is completed, Memory-Usage becomes 0MiB again, and if you try to transfer data again, the same phenomenon as above will occur.
The above phenomenon occurs with driver version 537.58 but not with driver version 472.12.

Why? :(

TL;DR File a bug with NVIDIA regarding the performance regression. Make sure to present it right because simply mentioning a 70ms difference in execution speed of an application makes it sound like a low-priority issue.

NVIDIA does not publish the internal workings of their drivers. The only people who can readily answer the question “why?” would likely be a small group of NVIDIA software engineers familiar with the particular part of the driver that deals with affected functionality. For what it is worth, there might be no causal relationship between the performance regression and the memory usage reported; it seems more likely that if there is a link, it is an underlying design change that causes both of the observed effects. I cannot even speculate what kind of design change that might be.

If you file a bug with NVIDIA regarding the performance regression observed, there is a reasonable chance that it will be fixed in the not too distant future. The chance that NVIDIA will explain “why” as part of the process is close to nil, however. NVIDIA engineering might be able to suggest a workaround, though.

Thank you for your advice.
I will consider whether to report a bug or not.