I have 2 GPUs: GeForce GTX 980 and GeForce GTX 650 Ti. Also I have 2 monitors, both are connected to one GPU - 650. But the command CudaMemGetInfo shows that 980 has free memory just 3,378 MBytes while it has to be near 4 GBytes. GPU-Z shows that 980 use only 3 Mbytes, but I can’t allocate memory on 980 more than 3,378 MBytes. I don’t understand - what could be the reason for this strange memory consumption? PhysX uses GTX 650…
If you run nvidia-smi -q you should see a list at the end that shows all the processes currently using the GPU. Unfortunately, with the default WDDM driver under Windows, you will not get information as to how much GPU memory each of those processes is using (since the memory is under control of Windows, not the CUDA driver). But you may be able to spot apps using the GPU that you did not expect to (e.g. browsers like Firefox or Internet Explorer, or background apps like Folding@Home).
Also note that GPU memory may be fragmented, which means you cannot allocate all remaining free memory in one single chunk.
Not sure why “Processes” is showing up as N/A, on my system (64-bit Windows 7, Quadro K2200) it shows a list of the processes using the GPU. Some functionality of nvidia-smi is unavailable for consumer GPUs, but it seems difficult to imagine that the “Processes” tab would be one such feature. What OS are you using?
CUDA itself needs around 100 MB, so I can’t tell you where the memory goes with zero visibility via nvidia-smi. Are you accounting for possible fragmentation of the GPU memory by trying to allocate several small blocks, rather than one single big block? Have you tried rebooting the machine, in case there is a zombie process that is still holding on to GPU memory?
If you use fancy 3D features for your Windows desktop, that could eat up GPU memory, in addition to other possibilities already mentioned.
I don’t think there is any indication that anything is broken. There is just too little information present here to find out what the details of the situation are.
What I suggested above is to try allocating memory with smaller granularity, say in chunks of 100 MB, and then find how many chunks you can allocate. Possible fragmentation of the GPU memory means that remaining free memory may not be allocatable in one single chunk.