GTX Titan speed and Boost 2.0 under Linux?

Hello,

I plan to purchase six GTX Titan’s and use them for Amber 12 (Cuda) calculations. What will be the core speed of these cards under Linux: 836, 876 or 973 Mhz? The last value is the "normal" one at 80C target under Windows. Does Nvidia support boost 1.0 and/or boost 2.0 technologies under Linux?

I am horrified that actually the most of the Linux users complain about the core speed of GTX6xx family and claims that their cards run at much lower speed. How you comment this?

Regards,

NVIDIA’s Linux documentation doesn’t have a single word about GPU boost so I guess, taking into consideration this quote: “However, GPU Boost is quite a bit more complex than Turbo Boost, consisting of both a software and a hardware component” you’ll be stuck with the base upper frequency which is 837MHz.

I actually ordered one, come on nVidia, I ponied up some pretty good cash here and expect a fully functional card under linux.

I asked the Amber 12 GPU ver. developers (the CUDA software that I use under Linux) about GTX Titan but they don’t know what speed we can actually expect too. It is also totally unclear as @Birdie pointed out is the boost technology works under Linux and I expect that we will be caped to only 836Mhz too.

I am totally disappointed from Nvidia Linux driver developers. I had a problem with a pair of two GTX590’s, which never run on the reference! speed under Linux. This problem was pointed by many users but not solved. Moreover, how it is possible almost 1 year after GTX680 release data if you wish to obtain the speed of your GPU to receive that:
CUDA Device Core Freq: 0.71 GHz??

If and when someone has some test results or answer from Nvidia please past it here or via personal massage.

Regards,

I was looking at nvidia-smi with more recent drivers and there are some options there that may be of benefit to the Titan being that the Titan is similar to the Tesla K20x. There is an option for setting memory and GPU clock frequency and an option for setting the power limit in watts. There is also a GPU Operation Mode with choices of ALL_ON, COMPUTE, and LOW_DP. I know that none of these options work with the GTX 680 but perhaps these options will work with the Titan.

I can’t speak for the 5xx series or the new Titan cards, but in the case of the 6xx cards, the speed isn’t actually slower, it’s just reported incorrectly by nvidia-settings. At least this is what benchmarks suggest.

Could you please post benchmark results to provide evidence for your statements? I have checked Windows and Linux on the same hardware using Blender CUDA benchmark scene: Linux is slower!

Hope that the memory and GPU clock frequency settings will work…

I received some tests performed. Here is the comparison between LuxMax results obtained by GTX660 under Linux and Windows, respectively:

http://img22.imageshack.us/img22/1279/luxmarkubuntu1204.png
http://img692.imageshack.us/img692/9647/luxmarkwin7.png

According to these results the GTX660 works at 1071Mhz, thus the Boost speed and the results between Linux and Windows are similar.

However, Nvidia answered me that the GTX Titan core speed under Linux will be 837MHz and about the boost technology this: “unfortunately no, boost 1.0/2.0 are only supported on windows.”

Thus for me it is still not clear what is the clock speed of GTX6xx GPU’s under Linux. Personally I trust on the above tests:)If they really caped their GTX GPU’s under Linux to the base clock presumably only the BIOS hack option will be possible, which is…:)

@scix55 can you provide some tests please?

Regards,
Filip

I just set up my EVGA SC GTX Titan and gave it a whirl with my DP CUDA code:

BTW, this is all under Windows 7 x64, as I could not find any Linux drivers that mentioned support for GTX Titan when I checked earlier.

I can use either NVIDIA Inspector or EVGA Precision X to offset the GPU clock +200 MHz, albeit it only sporadically hits its target. It seems like the BIOS is coded not to go over 80 or 81 deg C when full DP performance is enabled. Ideas on how to override this are welcome.

My DP CUDA code running (98-99% GPU usage) on a Tesla K20 that took about 3 mins 55 secs takes 3 mins 4 secs on GTX Titan SC. If I bump up the clocks, I can get down to about 2 min 37 sec… or about 14% performance increase, which sounds probably about right for a typical OC.

Average clock rate is 1013 MHz over the length of the simulation, with a max of 1149 MHz, and min of 966 MHz… not bad. For reference, the default clock on this card is 876 w/ Boost to 928. Seems like the +200 MHz bump translates to at least a 100 MHz clock boost in CUDA code. (this is with fan set on auto, see edit below)

What is strange to me is that EVGA Precision seems to allow up to a 94 C temp target, but I can’t get above 80 or 81 C with CUDA computations… I’m assuming this is an artificial boundary set to avoid any overheating or computational errors. Even at the overclocked frequencies, I get the (correct) same results as I did with the K20, though… these chips seem to be pretty stable even when OC’ed!

Edit: One thing I noticed is that if I set the fan higher than the automatic setting, the clock rate actually decreases during computation, despite the fact that the GPU does not exceed 80 deg C threshold that I previously noticed.

Edit 2: If I underclock fan, it seems I can go up to 95 deg C with no numerical errors and knock off about 4 more seconds off the previous times, seems like the avg clock rate is about 50MhZ higher. I also downclocked the memory as well, but not really sure if that matters much for heat.

Edit 3: I figured out how to get (more-or-less) more of a constant overclock. It involves setting the maximum P-state to P2 with the clocks I wanted. With P2 set to 1126 MHz, I get clocks from 1006 to 1123 MHz, with an average of 1052 MHz. With those clocks, the simulation goes down to 2 mins 31 sec. See below for how to set P-state to P2:

[url]https://forums.geforce.com/default/topic/519233/geforce-600-series/how-can-i-disable-gpu-boost-gtx-670/[/url]

Edit 4: Toyed with increasing P2 to 1226 MHz, that got me 2 mins 28 secs. I start noticing some slight throttling down to the 1GHz clockrange after 70 or 80 deg C. Tried 1276 P2 set clocks and started seeing numerical inaccuracies and slightly more throttling towards the end of the simulation… so my best CUDA clocks (so far) are: 1032 to 1215 with an average of 1093 MHz w/ a 400 MHz memory underclock (P2 set to 1226 MHz)… not bad!

I wanted to point out that I also saw this, but the memory/GPU clock frequency targets of nvidia-smi as well as the GPU operation mode are not supported by Titan, tried it a bit ago. Also no support to change to TCC mode… but more on this below:

I mentioned this on Anandtech also – I’ll repeat it here. Back in the day it was actually possible to convert a GTX 480 into a Tesla C2050… see:
https://devtalk.nvidia.com/default/topic/489965/cuda-programming-and-performance/gtx480-to-c2050-hack-or-unlocking-tcc-mode-on-geforce/

Those 2 cards I mentioned (GTX 490,Tesla C2050/70) had video outputs. The Tesla K20 and K20x do not have video outputs, but the Titan does, so it might be that BIOS flashing a Titan with a K20X BIOS and softstraps modding as above will at best not give you a display output and at worst might brick your new shiny card. So, needless to say tread carefully if you decide to try something like that.

So basically it might be possible to restore the Tesla missing features to Titan, but it might not be trivial.

I have posted some nbody benchmarks here under Linux:
https://devtalk.nvidia.com/default/topic/533200/linux/gtx-titan-drivers-for-linux-32-64-bit-release-/

Bechmarks for Windows 7 x64, X79 platform, default clocks on EVGA GTX Titan SuperClocked are pretty much the same:

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX TITAN
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     11130.2

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     11760.7

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     221080.0

I had the same results in the bandwidth testing. However, when rendering with CUDA (not OpenCL like LuxMark does) I get different results under Windows and Linux.

Do you mean speed of execution or… ? Chances are WDDM model is accounting for the difference in Windows, there’s not much overhead in Linux.

There’s also no good way to overclock in Linux short of a modified BIOS… however, overclocking in Windows is a lot easier… so a bit of a tradeoff there… Of course you could just do both the modded BIOS AND overclocking in Windows… see this post of my experience with both:

http://www.overclock.net/t/1363440/nvidia-geforce-gtx-titan-owners-club/3720#post_19492000