cudaMemGetInfo Vs nvmlDeviceGetMemoryInfo

shiraska · January 21, 2025, 2:24pm

I have a system with two identical NVIDIA RTX A4000 GPUs. The display is initialized only on the second GPU. As expected, nvidia-smi reports the correct memory utilization, indicating that the second GPU has less free memory. However, I observed that the cudaMemGetInfo function from the CUDA runtime library incorrectly returns the same values for both GPUs. In contrast, the nvmlDeviceGetMemoryInfo function from the NVML library correctly reflects the information reported by nvidia-smi. Which function should I rely on for an accurate estimation of free memory? My objective is to programatically select the optimal GPU for high-performance computations based on available memory.

Curefab · January 21, 2025, 2:35pm

Which operating system? Which driver model?

shiraska · January 21, 2025, 2:47pm

Windows, Driver Version: 537.70, Driver Model WDDM.

Robert_Crovella · January 21, 2025, 6:23pm

The WDDM driver model maintains a virtual memory system for the GPU memory. It can oversubscribe that memory. When a display GPU is being used for CUDA, the CUDA subsystem uses the WDDM driver to get access to the GPU. Therefore the report is a function of what WDDM is telling the CUDA subsystem. If WDDM chooses to report some value higher than what you believe is “actually available” on the GPU, there isn’t much you can do about it. It might be that WDDM intends to actually make that much memory available to CUDA, and via oversubscription, it may be in fact possible.

shiraska · January 22, 2025, 2:28pm

Thank you very much for the clarification. I still have two questions:

Does cudaMemGetInfo return the correct values for TCC driver model?
Does nvmlDeviceGetMemoryInfo return correct values for both WDDM and TCC driver models ? As I mentioned above, I find that the estimation of free memory using nvmlDeviceGetMemoryInfo is pretty much accurate even for WDDM driver models.

Robert_Crovella · January 22, 2025, 10:25pm

my claim is that all the functions in question return correct values, in all cases and situations.

I’ve explained why I think the WDDM case may not line up with your expectations or may not line up with some other function. That doesn’t mean it is incorrect, although I don’t wish to argue that point. Just stating my opinion.

I personally would not use the incorrect label for anything I see here.

shiraska · January 23, 2025, 12:08pm

OK. Instead of correct/incorrect values, I should have stated that the most reliable values of the free memory in order to programatically select the optimal GPU for high-performance computations. On which function can I rely on to achieve this?

Curefab · January 23, 2025, 1:00pm

Can you allocate as much memory as is stated by cudaMemGetInfo?

Is the performance worse than, when allocating the amount from nvmlDeviceGetMemoryInfo?

I know, you would like to get some official recommendation from Nvidia.

Robert_Crovella · January 23, 2025, 3:02pm

I’m not sure what that means. I use memory size as a measure of capability, not optimality.

If a set of GPUs are equally capable (i.e. the memory size is enough) then the determinant of optimality would be other factors like expected performance, which don’t have anything to do with memory size.

From a reliability standpoint, the only sense I can make is “can I reliably allocate the indicated memory?” No. You cannot. Not in any case, in any setting, using either of the functions, for any OS.

Its well discussed on various forums that you cannot allocate reliably the full indicated amount of memory whether that is indicated via nvml or via CUDA. If you then want to discuss “well how close can I get to that limit?” I have no information. I don’t know any reason that you could not get within some delta of the limit suggested by CUDA (or WDDM, if you prefer) or the limit suggested by NVML. I don’t know how to compare those deltas, except experimentally. I know of no documentation or guidance in this area.

Just to be clear, the original premise reported in this thread was that on WDDM, the report from cudaMemGetInfo may be higher than expected. I agree with that, and have observed it myself, and I believe it is due to WDDM VMM including oversubscription, and I believe in some cases from a CUDA perspective you can allocate “based on” that info. That is, you can allocate more than you might otherwise think, based on your own accounting of what should be in the GPU memory. And I don’t know of any critical reasons not to do that. Sure, it will involve some memory swapping, but from an ML perspective, capability is king for the vast majority of users posting on forums, and more memory == more capability in the ML space in general/as a whole. Perhaps also in HPC.

shiraska · January 23, 2025, 7:41pm

I am sorry for not providing sufficient background information earlier to clarify my situation. I have two identical GPUs and want to run two single-GPU applications, A and B. Application A involves both graphics rendering and numerical computations. During its execution, Application A launches Application B, which performs only numerical computations. However, I cannot control which GPU Application A selects for rendering.
To avoid a scenario where both applications run on the same GPU—causing operations to queue while the second GPU remains idle—I want to automatically determine the index of the idle GPU and ensure Application B runs on it. My only available metric for this decision is the estimated free memory of each GPU. I can reasonably assume that the GPU used by Application A for rendering and computations will have less free memory. Ideally, I could switch the non-display GPU’s driver mode to TCC, enabling me to use NVML to identify it. In this case, Application A would use the WDDM GPU for rendering. However, this approach may not be feasible for all users due to various constraints. As a result, I need a method to select the appropriate GPU even when both are operating in WDDM mode. The free memory estimation is purely for selecting the GPU; I am not concerned about whether I can fully allocate the available memory.

Robert_Crovella · January 23, 2025, 8:05pm

each GPU has a UUID which is queryable using NVML. Use that along with host-based IPC of your choice to let the pair of applications choose together how to run. In nearly all of these cases I suggest using deterministic methods rather than inferential methods to determine what GPU to run on.

I don’t have any further information about how to better interpret memory info.

Curefab · January 23, 2025, 8:43pm

Can you set an environment variable? → CUDA_VISIBLE_DEVICES

Topic		Replies	Views
Memory info and allocation CUDA Programming and Performance	3	885	August 30, 2018
CudaMemGetInfo() not reporting the actual gpu memory stats CUDA Programming and Performance	4	2597	October 13, 2023
cudaMemGetInfo returns similar result for 3 different GPUs CUDA Programming and Performance cuda , nvbugs	5	376	January 23, 2024
Why nvidia-smi, nor cudaMemGetInfo do not throw error with over-occupied device memory? CUDA Programming and Performance cuda	6	558	June 8, 2023
Device Memory Mangement CUDA Programming and Performance	14	3450	December 5, 2008
Memory on the Nvidia device between kernel calls tends to retain state CUDA Programming and Performance	26	14399	June 21, 2009
cudaMemGetInfo returning wrong amounts of free memory CUDA Programming and Performance	6	1507	August 7, 2019
why cudaGetDeviceProperties and cudaMallocPitch consume a lot of time CUDA Programming and Performance	18	2364	January 9, 2017
Incorrect total memory reported by cudaMemGetInfo CUDA Programming and Performance	8	6584	June 11, 2012
cudaMemGetInfo no longer returns free memory for full system only looks at current process CUDA Programming and Performance	4	3983	November 2, 2018

cudaMemGetInfo Vs nvmlDeviceGetMemoryInfo

Related topics