I have a system with two identical NVIDIA RTX A4000 GPUs. The display is initialized only on the second GPU. As expected, nvidia-smi reports the correct memory utilization, indicating that the second GPU has less free memory. However, I observed that the cudaMemGetInfo function from the CUDA runtime library incorrectly returns the same values for both GPUs. In contrast, the nvmlDeviceGetMemoryInfo function from the NVML library correctly reflects the information reported by nvidia-smi. Which function should I rely on for an accurate estimation of free memory? My objective is to programatically select the optimal GPU for high-performance computations based on available memory.
Which operating system? Which driver model?
Windows, Driver Version: 537.70, Driver Model WDDM.
The WDDM driver model maintains a virtual memory system for the GPU memory. It can oversubscribe that memory. When a display GPU is being used for CUDA, the CUDA subsystem uses the WDDM driver to get access to the GPU. Therefore the report is a function of what WDDM is telling the CUDA subsystem. If WDDM chooses to report some value higher than what you believe is “actually available” on the GPU, there isn’t much you can do about it. It might be that WDDM intends to actually make that much memory available to CUDA, and via oversubscription, it may be in fact possible.
Thank you very much for the clarification. I still have two questions:
- Does cudaMemGetInfo return the correct values for TCC driver model?
- Does nvmlDeviceGetMemoryInfo return correct values for both WDDM and TCC driver models ? As I mentioned above, I find that the estimation of free memory using nvmlDeviceGetMemoryInfo is pretty much accurate even for WDDM driver models.
my claim is that all the functions in question return correct values, in all cases and situations.
I’ve explained why I think the WDDM case may not line up with your expectations or may not line up with some other function. That doesn’t mean it is incorrect, although I don’t wish to argue that point. Just stating my opinion.
I personally would not use the incorrect label for anything I see here.
OK. Instead of correct/incorrect values, I should have stated that the most reliable values of the free memory in order to programatically select the optimal GPU for high-performance computations. On which function can I rely on to achieve this?
Can you allocate as much memory as is stated by cudaMemGetInfo?
Is the performance worse than, when allocating the amount from nvmlDeviceGetMemoryInfo?
I know, you would like to get some official recommendation from Nvidia.
I’m not sure what that means. I use memory size as a measure of capability, not optimality.
If a set of GPUs are equally capable (i.e. the memory size is enough) then the determinant of optimality would be other factors like expected performance, which don’t have anything to do with memory size.
From a reliability standpoint, the only sense I can make is “can I reliably allocate the indicated memory?” No. You cannot. Not in any case, in any setting, using either of the functions, for any OS.
Its well discussed on various forums that you cannot allocate reliably the full indicated amount of memory whether that is indicated via nvml or via CUDA. If you then want to discuss “well how close can I get to that limit?” I have no information. I don’t know any reason that you could not get within some delta of the limit suggested by CUDA (or WDDM, if you prefer) or the limit suggested by NVML. I don’t know how to compare those deltas, except experimentally. I know of no documentation or guidance in this area.
Just to be clear, the original premise reported in this thread was that on WDDM, the report from cudaMemGetInfo
may be higher than expected. I agree with that, and have observed it myself, and I believe it is due to WDDM VMM including oversubscription, and I believe in some cases from a CUDA perspective you can allocate “based on” that info. That is, you can allocate more than you might otherwise think, based on your own accounting of what should be in the GPU memory. And I don’t know of any critical reasons not to do that. Sure, it will involve some memory swapping, but from an ML perspective, capability is king for the vast majority of users posting on forums, and more memory == more capability in the ML space in general/as a whole. Perhaps also in HPC.
I am sorry for not providing sufficient background information earlier to clarify my situation. I have two identical GPUs and want to run two single-GPU applications, A and B. Application A involves both graphics rendering and numerical computations. During its execution, Application A launches Application B, which performs only numerical computations. However, I cannot control which GPU Application A selects for rendering.
To avoid a scenario where both applications run on the same GPU—causing operations to queue while the second GPU remains idle—I want to automatically determine the index of the idle GPU and ensure Application B runs on it. My only available metric for this decision is the estimated free memory of each GPU. I can reasonably assume that the GPU used by Application A for rendering and computations will have less free memory. Ideally, I could switch the non-display GPU’s driver mode to TCC, enabling me to use NVML to identify it. In this case, Application A would use the WDDM GPU for rendering. However, this approach may not be feasible for all users due to various constraints. As a result, I need a method to select the appropriate GPU even when both are operating in WDDM mode. The free memory estimation is purely for selecting the GPU; I am not concerned about whether I can fully allocate the available memory.
each GPU has a UUID which is queryable using NVML. Use that along with host-based IPC of your choice to let the pair of applications choose together how to run. In nearly all of these cases I suggest using deterministic methods rather than inferential methods to determine what GPU to run on.
I don’t have any further information about how to better interpret memory info.
Can you set an environment variable? → CUDA_VISIBLE_DEVICES