If I ran TensorFlow, it shows only 4.97G Free Memory. If I ran Theano, only 82.3% memory maximum can be allocated. If I check with Matlab, it says:
TotalMemory: 6.4425e+09
AvailableMemory: 5.2556e+09
It seems this issue does not depend on whether I use this card for display or not: I switch to on-board Intel GPU for display, but it returns the exact same available memory amount. Nvidia-msi only report 40MB memory in use.
I did a lot of research online, hardly find any topic discussing this. Is there a way to have the missing 1GB memory available for computation?
The remaining memory is used by windows and the wddm driver, and CUDA.
It’s simply not realistic to assume that because a card has 6GB of memory that you can allocate all 6GB. It does not work that way in any setting.
There is some overhead used by windows wddm, and also some overhead used by CUDA. Other things like Tensorflow, matlab, etc. may also use some memory before you get a chance to allocate any.
On windows, whether a card is used for display or not will have no bearing on whether windows decides to build a WDDM display driver stack on it or not. For GeForce cards, a WDDM display driver stack will always be built on the card.
The simple answer is to use Linux if you want to avoid the Windows overhead (may only be on the order of ~100 or 200 MB) or buy a card that has TCC support (e.g. Titan cards) and then only worry about the overhead of your software tools.
after initialization in MATLAB (no video cables connected to outputs), essentially 355 MB in overhead alone.
Also, unless MATLAB has changed since 2016b, any sort of complex arrays copied to the GPU memory will occupy twice the expected space. Creating the arrays on GPU memory will not incur this penalty. Just something I noticed a few months ago.
On my primary card (GT640), I get about 480 MB of overhead running 2x 1080p screens. After initializing MATLAB for that GPU, that number jumps to 564 MB. This is all under Windows 7 x64. Memory usage will change slightly depending on the Windows version. I cannot do an apples to apples comparision, but expect overhead fluctuation between Windows 7, Windows 8.1, and Windows 10 using WDDM driver.
You might want to check with the Tensorflow guys. It is possible that WDDM 2.0 (used by Windows 10) has a bigger memory footprint, or that the memory footprint varies with Windows GUI settings, but from what I see, for Windows 7 the combined GPU memory footprint of a CUDA context (about 90 MB) plus WDDM display driver seems to be around 300 MB, not 680 MB as shown in #3.
As txbob points out the fact that you use the GTX 1050 Ti only for compute does not prevent WDDM from creating a context for it.