I’m using Win10 and trying to allocate maximum of 6gb memory on 1060 6GB. The main GPU is internal if that matters at all.
Now, cudaMemGetInfo shows ~5GB of free memory, while nvmlDeviceGetMemoryInfo shows ~5.8 of free memory. In this case, cudaMemGetInfo is right in the sense that I cannot allocate more then 5GB.
Why do these two functions give different results? Is it possible to increase the memory utilization?
The nvml function is showing you the GPU free memory before you build a CUDA instance on the device. The cudaMemGetInfo is showing you the free memory after you instantiate a CUDA context on the device. The CUDA context has overhead. There is no way for you to decrease this overhead.
Even if I call the nvml later, while performing cuda computation? (that is what I do…)
I probably wouldn’t be able to explain it then. Perhaps you are reading the wrong item in the nvml memory structure.
What does nvidia-smi report in this scenario? It is using the same nvml call, I believe.