Hello everyone, I’ve got a Titan V with 12 GB and when I use Tensorflow with Cuda the device context is always created with 9925 MB of ram, for any project or for any version of Cuda or Tensorflow. It also creates a context with 9925 MB even if I use a second GPU for display.
I can’t use anywhere near 12 GB of ram. Does anyone else have the same problem? Is my card defective?
Is there a setting I’m failing to set?
It sounds like you’re not properly deactivating the display function. I assume this is on linux. You will need to remove the Titan V from the X system. Before you do the TF run that creates a 9925MB allocation/context, what is the output of nvidia-smi?
As long as the GPU is also used to service an operating system’s GUI (doesn’t matter what OS), you won’t be able to use the full memory of the GPU for CUDA.
This is why txbob suggested that you exclude the Titan V from the X system. How much memory are you able to allocate for CUDA apps when you follow that advice?
Under Windows, you would want to switch to the TCC driver instead of using the default WDDM driver (not sure whether the TCC driver is supported for the Titan V, as I haven’t had a chance to use one).
Note that even if a GPU is used exclusively for CUDA, any CUDA-enabled app will be limited to about 95% of the physical memory on the card, since CUDA itself also needs GPU memory.
Ok, I’ve put my other GPU back in this machine and experimented with setting TCC mode with nvidia-smi.
First thing I will note is I can now grab about 10985 MB memory on Linux, slightly higher on Windows, a significant improvement.
I notice that there is no TCC option in Linux, the man page for nvidia-smi says it is only for Windows.
Training time per Epoch
Windows w/WDDM 6.5 seconds per
Windows w/TCC 4.4 seconds per
Linux 4.7 seconds per
Linux w/Persist Mode 4.5 seconds per, startup much faster than all of the above.
So it would seem Windows with TCC actually gives the best performance. This could be because the drivers on windows are 3.98 while the linux ones are 3.96. I’ve also custom built tensorflow against cuda 9.2 on Linux, but that did not overcome Windows running with a prebuilt tensorflow-gpu binary on cuda 9.0, don’t know if that is due to the Volta card, insufficient optimizations or whatever.
The TCC driver is exclusive to Windows because it (or something equivalent) is not needed on other operating systems supported by CUDA.
Up to and including Windows XP, Windows provided a driver model with low overhead for graphics devices. The downside was that it made it all too easy for graphics drivers to crash the operating system. So with Windows 7, Microsoft introduced a new driver model, WDDM 1.x, that gave the operating system a large amount of control over, and isolation from, graphics devices.
For example, with WDDM a graphics driver needs to allocate memory on the GPU through the operating system facilities. This approach created massive overhead. The NVIDIA drivers try to mitigate this overhead as much as possible, for example by batching kernel launches in CUDA. While this helps overall performance, it can also create performance artifacts.
So NVIDIA came up with an alternative driver, the TCC driver, that tells the operating system to treat the GPU as a 3D Controller, not a graphics device, thus incapable of supporting the GUI. This provides a low-overhead driver environment that is competitive with the Linux driver environment performance wise. As you have found, and I have heard anecdotally from other people, it may even be a tad faster.
With Windows 10, Microsoft grabbed control of graphics devices even harder with a new driver model variant WDDM 2.x. One widely observed side-effect of this is that it is not possible for CUDA programs to allocate more than about 81-82% of the GPU’s physical memory.