ASUS GTX Titan Benchmarks, subpar FP64 performance?

Dear all,

I’ve recently put together a system with ASUS p9x79ws motherboard, intel i7-3970X cpu, 48 gb of ddr3-1866 ram and
an ASUS GTX-TITAN.

I’m using Ubuntu 12.04 LTS,

I’ve installed the drivers 319.32 and I am using it with NVreg_EnablePCIeGen3=1, so that I see PCIe 3.0 speeds.

Device 0: GeForce GTX TITAN
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			11247.0

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			11236.9

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			220495.0

I’ve CUDA 5.0.35_linux_64_ubuntu11.10

When I was compiling the NBODY tests, the compiler complained about double precision not being supported. I’ve edited the Makefile to pass

-gencode arch=compute_35,code=sm_35 -gencode arch=compute_35,code=sm_35

to the compiler, and the warning disappeared.

Now, I am trying to repeat benchmark figures in the post https://devtalk.nvidia.com/default/topic/533200/gtx-titan-drivers-for-linux-32-64-bit-release-/

However, I have:

./nbody -benchmark -numbodies=229376 -device=0 -fp64
> Compute 3.5 CUDA device: [GeForce GTX TITAN]
number of bodies = 229376
229376 bodies, total time for 10 iterations: 91547.242 ms
= 5.747 billion interactions per second
= 172.414 double-precision GFLOP/s at 30 flops per interaction

Which is 1/4th of the reported values.

The float precision is okay,

./nbody -benchmark -numbodies=229376 -device=0
> Compute 3.5 CUDA device: [GeForce GTX TITAN]
number of bodies = 229376
229376 bodies, total time for 10 iterations: 5238.231 ms
= 100.441 billion interactions per second
= 2008.821 single-precision GFLOP/s at 20 flops per interaction

Do you have an idea what is going wrong?

You need to enable the full double-precision output checkbox in NVidia-settings. By default, GTX Titan does not have that enabled.

Keep in mind that if you enabled the double precision you bandwidth will be half.

Thanks! That is highly possible to be the source of the problem, but I how do I do that setting in a remote node without an X-server? nvidia-smi seems not to work

nvidia-smi --gom=1
GOM features not supported for GPU 0000:01:00.0.
Treating as warning and moving on.
All done.

I just wanted to note that I don’t believe enabling DP supports cuts bandwithTest values. I believe it only cuts on Boost clock speeds.

On the topic of being a headless node, to set that flag, you have at least 1 option, which is either faking a display – http://blog.cryptohaze.com/2011/02/nvidia-fan-speed-control-for-headless.html or https://sites.google.com/site/akohlmey/random-hacks/nvidia-gpu-coolness, connecting an actual display, or heck, even a fake display – http://blog.zorinaq.com/?e=11.

millecker sent me a PM lately regarding bypassing the NVML check of a supported GPU with regards to nvidia-smi. I’m posting the quote at the end of this post. Even if it doesn’t help in setting that flag, it is useful to monitor other GPU parameters via the command line.

I just tested it on my Ubuntu install with 3.5.0-21-generic kernel and NVIDIA drivers 319.32 and nvidia-smi really does display the additional information with the shim! :)

Among the additional things it lists for my GTX Titan that are otherwise N/A’d are:

GPU Operation Mode
        Current                     : All On
        Pending                     : All On
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 1
            Link Width
                Max                 : 16x
                Current             : 16x
    Performance State               : P8
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
    Power Readings
        Power Management            : Supported
        Power Draw                  : 14.75 W
        Power Limit                 : 250.00 W
        Default Power Limit         : 250.00 W
        Enforced Power Limit        : 250.00 W
        Min Power Limit             : 150.00 W
        Max Power Limit             : 265.00 W
    Clocks
        Graphics                    : 324 MHz
        SM                          : 324 MHz
        Memory                      : 324 MHz
    Max Clocks
        Graphics                    : 1254 MHz
        SM                          : 1254 MHz
        Memory                      : 3004 MHz

Here is the information on how to accomplish this:

Oh, and is is possible to monitor clocks, but not set them. I’ll just leave this here: ;)

root@Tesla:~# nvidia-smi
Sun Jul 28 13:28:37 2013       
+------------------------------------------------------+                       
| NVIDIA-SMI 5.319.32   Driver Version: 319.32         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 630      Off  | 0000:02:00.0      On |                  N/A |
| N/A   45C    P8    N/A /  N/A |      128MB /  2047MB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TITAN   Off  | 0000:03:00.0     Off |                  N/A |
| 30%   35C    P8    14W / 250W |       14MB /  6143MB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0            Not Supported                                               |
+-----------------------------------------------------------------------------+
root@Tesla:~# nvidia-smi --gom=1
GOM features not supported for GPU 0000:02:00.0.
Treating as warning and moving on.
GOM changed to "Compute" for GPU 0000:03:00.0.
All done.
Reboot required.
root@Tesla:~# nvidia-smi -pl 275
Changing power management limit is not supported for GPU: 0000:02:00.0.
Treating as warning and moving on.
Provided power limit 275.00 W is not a valid power limit which should be between 150.00 W and 265.00 W for GPU 0000:03:00.0
Terminating early due to previous errors.
root@Tesla:~# nvidia-smi -pl 265
Changing power management limit is not supported for GPU: 0000:02:00.0.
Treating as warning and moving on.
Power limit for GPU 0000:03:00.0 was set to 265.00 W from 250.00 W.

Warning: persistence mode is disabled on this device. This settings will go back to default as soon as driver unloads (e.g. last application like nvidia-smi or cuda application terminates). Run with [--help | -h] switch to get more information on how to enable persistence mode.

All done.

I should note that, even though I have shown you can enable this feature on GTX Titan, this is only a power-saving flag. It does not enable full DP support:

That being said, I am not aware on how to enable the CUDA DP flag without faking or using a display as I have mentioned before, since it is linked directly to nvidia-settings, which requires an X-server running. A while ago, there was a similar thread here:

https://devtalk.nvidia.com/default/topic/534302/turning-on-dp-for-nvidia-titan-on-headless-server-/

I went through the process of finding out the setting that is set in the nvidia-settings source code, but I wasn’t able to isolate it because it was so interdependent on nvidia-settings to begin with. No one commented further and that was the end of the discussion…

i guess there is no possiblity to set double-precision performance without X server, but I think that if you do it oncce it will work even if you restart the computer and there is no X server running.

This might do the trick:

nvidia-settings -a [gpu:0]/GPUDoublePrecisionBoostImmediate=1

Haha, yes, I posted that in the thread I linked to. I’m pretty sure that command still requires an X server running to do the changes. After that, I believe it is not needed anymore as pasoleatis said.