Titan V missing memory?

JosephN · July 14, 2018, 1:03am

Hello everyone, I’ve got a Titan V with 12 GB and when I use Tensorflow with Cuda the device context is always created with 9925 MB of ram, for any project or for any version of Cuda or Tensorflow. It also creates a context with 9925 MB even if I use a second GPU for display.

I can’t use anywhere near 12 GB of ram. Does anyone else have the same problem? Is my card defective?
Is there a setting I’m failing to set?

Thanks!

Joseph

Robert_Crovella · July 14, 2018, 2:14am

It sounds like you’re not properly deactivating the display function. I assume this is on linux. You will need to remove the Titan V from the X system. Before you do the TF run that creates a 9925MB allocation/context, what is the output of nvidia-smi?

JosephN · July 14, 2018, 7:59am

Actually the same happens on Windows, Linux or Windows it doesn’t matter I can’t use more than 10 gb of ram.

As I’m variously booted into, and developing, on either what do I do in Windows as well?

nvidia-smi shows only 581mb in use idling on the desktop, 389mb by X, 76mb by gnome, 113mb by …-token=

Linux driver is 396.24

Joseph

njuffa · July 14, 2018, 12:56pm

As long as the GPU is also used to service an operating system’s GUI (doesn’t matter what OS), you won’t be able to use the full memory of the GPU for CUDA.

This is why txbob suggested that you exclude the Titan V from the X system. How much memory are you able to allocate for CUDA apps when you follow that advice?

Under Windows, you would want to switch to the TCC driver instead of using the default WDDM driver (not sure whether the TCC driver is supported for the Titan V, as I haven’t had a chance to use one).

Note that even if a GPU is used exclusively for CUDA, any CUDA-enabled app will be limited to about 95% of the physical memory on the card, since CUDA itself also needs GPU memory.

JosephN · July 15, 2018, 9:51pm

Ok, I’ve put my other GPU back in this machine and experimented with setting TCC mode with nvidia-smi.

First thing I will note is I can now grab about 10985 MB memory on Linux, slightly higher on Windows, a significant improvement.

I notice that there is no TCC option in Linux, the man page for nvidia-smi says it is only for Windows.

Training time per Epoch

Windows w/WDDM 6.5 seconds per
Windows w/TCC 4.4 seconds per
Linux 4.7 seconds per
Linux w/Persist Mode 4.5 seconds per, startup much faster than all of the above.

So it would seem Windows with TCC actually gives the best performance. This could be because the drivers on windows are 3.98 while the linux ones are 3.96. I’ve also custom built tensorflow against cuda 9.2 on Linux, but that did not overcome Windows running with a prebuilt tensorflow-gpu binary on cuda 9.0, don’t know if that is due to the Volta card, insufficient optimizations or whatever.

Thanks for the knowledge!

Joseph

njuffa · July 15, 2018, 10:51pm

The TCC driver is exclusive to Windows because it (or something equivalent) is not needed on other operating systems supported by CUDA.

Up to and including Windows XP, Windows provided a driver model with low overhead for graphics devices. The downside was that it made it all too easy for graphics drivers to crash the operating system. So with Windows 7, Microsoft introduced a new driver model, WDDM 1.x, that gave the operating system a large amount of control over, and isolation from, graphics devices.

For example, with WDDM a graphics driver needs to allocate memory on the GPU through the operating system facilities. This approach created massive overhead. The NVIDIA drivers try to mitigate this overhead as much as possible, for example by batching kernel launches in CUDA. While this helps overall performance, it can also create performance artifacts.

So NVIDIA came up with an alternative driver, the TCC driver, that tells the operating system to treat the GPU as a 3D Controller, not a graphics device, thus incapable of supporting the GUI. This provides a low-overhead driver environment that is competitive with the Linux driver environment performance wise. As you have found, and I have heard anecdotally from other people, it may even be a tad faster.

With Windows 10, Microsoft grabbed control of graphics devices even harder with a new driver model variant WDDM 2.x. One widely observed side-effect of this is that it is not possible for CUDA programs to allocate more than about 81-82% of the GPU’s physical memory.

Topic		Replies	Views
Titan V slower than 1080ti tensorflow:18.08-py3 and 396.54 drivers Frameworks tensorflow	21	10318	October 12, 2021
Final word on Titan X and TCC? CUDA Programming and Performance	17	10986	September 3, 2018
Idle GPU has non-empty memory usage CUDA Programming and Performance	3	1776	May 10, 2019
2.5GB of video memory missing in TensorFlow on both Linux and Windows [RTX 3080] TensorRT cuda , tensorflow , python , gpu	9	2733	August 6, 2022
Using multiple GPUs to scale an existing Cuda application - failing to allocate memory CUDA Programming and Performance	5	1153	September 4, 2018
[980 Ti, Windows 10, CUDA 7.5] Out of memory after allocating 4.5 out of 6gb CUDA Programming and Performance	7	5109	December 6, 2015
TitanX slower than CPU (Tensorflow), possible configuration issue CUDA Programming and Performance	9	4508	April 13, 2016
Did TensorFlow caused GPU memory crash? CUDA Setup and Installation	5	4914	April 26, 2017
Nvidia driver conflict CUDA_ERROR_NO_DEVICE Linux	10	9707	June 28, 2018
Cuda Runtime version is insufficent for CUDA runtime version. All of my versions match! CUDA Setup and Installation	2	735	October 26, 2018

Titan V missing memory?

Training time per Epoch

Related topics