I just set up my EVGA SC GTX Titan and gave it a whirl with my DP CUDA code:
BTW, this is all under Windows 7 x64, as I could not find any Linux drivers that mentioned support for GTX Titan when I checked earlier.
I can use either NVIDIA Inspector or EVGA Precision X to offset the GPU clock +200 MHz, albeit it only sporadically hits its target. It seems like the BIOS is coded not to go over 80 or 81 deg C when full DP performance is enabled. Ideas on how to override this are welcome.
My DP CUDA code running (98-99% GPU usage) on a Tesla K20 that took about 3 mins 55 secs takes 3 mins 4 secs on GTX Titan SC. If I bump up the clocks, I can get down to about 2 min 37 sec… or about 14% performance increase, which sounds probably about right for a typical OC.
Average clock rate is 1013 MHz over the length of the simulation, with a max of 1149 MHz, and min of 966 MHz… not bad. For reference, the default clock on this card is 876 w/ Boost to 928. Seems like the +200 MHz bump translates to at least a 100 MHz clock boost in CUDA code. (this is with fan set on auto, see edit below)
What is strange to me is that EVGA Precision seems to allow up to a 94 C temp target, but I can’t get above 80 or 81 C with CUDA computations… I’m assuming this is an artificial boundary set to avoid any overheating or computational errors. Even at the overclocked frequencies, I get the (correct) same results as I did with the K20, though… these chips seem to be pretty stable even when OC’ed!
Edit: One thing I noticed is that if I set the fan higher than the automatic setting, the clock rate actually decreases during computation, despite the fact that the GPU does not exceed 80 deg C threshold that I previously noticed.
Edit 2: If I underclock fan, it seems I can go up to 95 deg C with no numerical errors and knock off about 4 more seconds off the previous times, seems like the avg clock rate is about 50MhZ higher. I also downclocked the memory as well, but not really sure if that matters much for heat.
Edit 3: I figured out how to get (more-or-less) more of a constant overclock. It involves setting the maximum P-state to P2 with the clocks I wanted. With P2 set to 1126 MHz, I get clocks from 1006 to 1123 MHz, with an average of 1052 MHz. With those clocks, the simulation goes down to 2 mins 31 sec. See below for how to set P-state to P2:
Edit 4: Toyed with increasing P2 to 1226 MHz, that got me 2 mins 28 secs. I start noticing some slight throttling down to the 1GHz clockrange after 70 or 80 deg C. Tried 1276 P2 set clocks and started seeing numerical inaccuracies and slightly more throttling towards the end of the simulation… so my best CUDA clocks (so far) are: 1032 to 1215 with an average of 1093 MHz w/ a 400 MHz memory underclock (P2 set to 1226 MHz)… not bad!