Maxwell is amazingly cool and quiet compared to the old days:
[ This was a brilliant video! :) ]
:-) :-) I remembered this little video from around the time I joined NVIDIA.
In the PC world, we have come a long way in terms of power efficiency in the past twenty years. I still recall PSUs with 75% efficiency and the use of linear regulators for DC-DC conversion! Certainly PCs had lower total power consumption back then, but in terms of efficiency, 50% of electrical energy was wasted right off the bat in conversion losses back then.
But it seems to me that more of the efficiency focus has been given to the semiconductor components like CPUs, compared to what a former colleague at AMD once called the “unsexy” components, like PSUs and DC-DC converters. Among those two, PSUs seem to have make faster strides: I recently noticed there now is an 80+ Titanium specification, and one can actually buy PSUs that adhere to it. Now if the DC-DC converters for CPUs and GPUs could be brought up to that level of efficiency, that would be awesome.
So ran nbody from the CUDA 6.5 release directory for 5 minutes and then took this screenshot of the GPU-Z output for one of the two GTX Titan-X GPUs;
[url]http://imgur.com/3GEtVLV[/url]
Did notice that the core clock started at about 1189.3 then dropped a bit as the temperature rose over 70 degrees Celsius.
Using only 38% of TDP.
very different numbers than Allanmac’s output. This is an EVGA Titan X without the ACX cooler they used for the high-clock EVGA GTX 980 SC.
Another item of interest was that when I ran one of my large brute force problems on a single Titan X and looked at that output on GPU-Z the both the core and memory clocks fluctuated quite a bit more and it showed the GPU load at 0%, which seems odd.
@CudaaduC, you have too many TITAN X’s. :)
I think you’re looking at the wrong TITAN X since NBody might run its compute on one GPU and render on whichever GPU is connected to your monitor.
The BIPS/GFLOPS in your screenshot are 25-30% more than the GTX 980 which happens to be very close to 30721189 cores vs. 20481392 (28%)!

In my case, I have the monitor connected to a 750 Ti and NBody selects the GTX 980 as the simulation GPU.
Here are the 750 Ti stats while running NBody on the GTX 980 in a very small window:

Make sure you run nbody in benchmark mode (-benchmark), otherwise the graphics synchronization with VSYNC will slow it down. I am pointing this out because I tend to forget that myself, even though I know about it.
One thing to keep in mind in general is that for actively cooled cards there is a feedback loop with positive gain between power draw and temperature. Rising temperature means (1) Ohmic resistance in the electronic components increases, driving up power draw at identical clock, the difference can be up to 10W (2) the fan needs to spin faster which can increase power consumption by up to 5W. Higher power consumption means more waste heat is generated leading to higher temperatures, …
Luckily, the system has dampening characteristics so you don’t get a run-away thermal/power meltdown. But this does highlight the issue that adequate airflow and reasonable air intake temperature are crucial for full speed operation. Last I checked, NVIDIA publishes specifications for these operating conditions for Tesla and Quadro cards. Not sure about consumer cards, but to first approximation one might want to look at the specs of the closest Quadro configuration (so look at Quadro M5000 specs for a Titan X).
I’ve added more data collected with a GTX 980 (reference design).
@njuffa: the boards I tested are housed in tower chassis with decent cooling. The TITAN X is in a fairly tidy case with 3x120mm fans (2x front, 1x rear), little obstruction, ~2 PCI-E slots worth of space above and a bit less below. It’s certainly better cooled than the other case where the 980 is. Ambient temp is around 18-20 C. While it may not be a perfect setup, it surely is far from simply a pathological case of bad cooling. Note that all other power-hungry components (the other GPU, most CPU cores) were idle during these tests.
Also note that I’ve observed before very similar behavior with TITAN X, 980 (as well as GK110) placed roomy 4U well-cooled rack-mounted chassis. Just ran a brief test on a Quadro M6000 that’s in such a 4U chassis in an air-conditioned server room - will add results to the spreadsheet later. On our machines use upstart scripts set the fan speed to 75%, but apparently even at that level under nbody benchmark load the M6000 does throttle slightly compared to the fan cranked up to 100%.
However, putting all this aside, the cooling/fan behavior described seems quite problematic, both for developer and production use. I know people running mini-clusters clusters with dozens of GTX cards who were not fully aware of the fan behavior and were loosing 10-15% application performance (that’s why we added some info on addressing the issue to the supporting material of a recent benchmark paper).
Even if cranking up the fan instead of letting the card throttle is not the default behavior, nvidia-smi seems the right place to allow such tweaks. Having to use an X server (and fake multi-screen config if you have >1 card), fan speed override, etc is a serious hurdle (and it’s not documented).
That’s odd, I do hit >180 W power draw with my board (see the spreadsheet). I’m also at 80 C at (a fan speed of ~50% in default mode) with nbody in just a few minutes; with the fan set to 100% I do get stable 66-67 C, 71 C with 75%. The EVGA fan is AFAIK better than the reference design, but the difference is still surprisingly big (even considering the less than stellar chassis airflow in my case).
The power consumption though is just weird. Are we running the same benchmark!? Or perhaps your chip is simply from a later batch, of better quality?
@pszilard: I agree that based on your description of equipment and environment it seems cooling should be more than adequate. Honestly, I am surprised to read of the thermal issues with Maxwell-class GPUs that you report, I had neither experienced nor heard about such issues before. Now I am curious whether this observed behavior is due to a hardware, VBIOS, or driver.
If you have not done so already, I would suggest bringing this issue to the attention of NVIDIA, going through your established contacts (to avoid any confusion: I stopped working for NVIDIA a good while ago).
As a workaround, consider moving your machines to a branch office in Kiruna. That should easily provide 30 degrees lower ambient air temperature this time of year; just open the windows :-)
Fun stuff: I can get the 980 to draw up to 207-208 W stable at 3505,1239 MHz with nbody, so something is fishy here, I can’t imagine this is the same workload as allanmac’s.
Of course I had to increase the power limit to 225 W. I had no idea the max is so much higher than the default (on the TITAN X it’s only 25W). :)
@njuffa: Thanks for the heads up. After the GK110 throttling experience I was only mildly surprised by the throttling in GM200/GM20x, so I did not really think of it as an issue. The management software issue are something I have been quite vocal about and I’m glad to see the recent shift (as mentioned in another thread).
Kiruna could work for cooling, but I’d rather not live there. ;)
@allanmac: I think I’ve spotted the source of the discrepancy. You probably ran the nbody bench without increasing the -nbodies argument from the default value, that why you’re getting only 75% SM utilization and much lower memory usage than I do.
My mistake in the previous post, my spreadsheet did include the correct benchmark command:
while true; do nbody -benchmark -numbodies=256000; done
I am betting it’s because I ran NBody across two GPUs – the 980 for simulation and a 750 Ti for rendering.
I usually keep the big cards headless so I don’t have to deal with UI issues while debugging.
nbody --benchmark shouldn’t require a display GPU. It does not open a graphics window AFAIK. I guess I haven’t studied the code to see if it does any graphics interop at all in benchmark mode, but I don’t see why it should.
Unfortunately, “nbody -benchmark” only runs for 10 iterations (10.5 msecs) and exits.
Even in the first case where you show 71 C?
As I mentioned in a previous post (#18) and in the header of the data sheet I shared I simply ran the nbody binary in an infinite loop. As starting a process takes some time, this results in a little bit of jitter in the GPU workload, but that’s not too important, I think
Yes.
You can drive up the execution time substantially by increasing the number of bodies. Try:
nbody -benchmark -numbodies=256000
FTFY:
You can drive up, among others, the execution time substantially by increasing the number of bodies.
Can you or anyone else run the same benchmark as I did (nbody with 256k bodies looped) on their boards, please? I’d like to know what’s the actual clocking behavior of these GM200 cards. I know that GK110 even with good cooling ends up thermal throttling quite easily (we have at least two dozen cards in my lab), but I don’t have enough data to conclude whether the same is still true for Maxwell, just to a lesser extent.
Just to be clear, for monitoring I used the following command - which gives quite a bit more detailed information than a GPU-Z screenshot:
nvidia-smi dmon -d 2 -s pucvm -i YOUR_DEVICE_ID
To get curves similar to the ones, I plotted power, temperature and clock speed/10 as a function of time.
Thanks!
The nvidia-smi switches you suggest don’t work on Windows 10 and the 359.00 driver.
I ran the 256K nbody benchmark in a tight loop and it does throttle at a peak of 103% TDP and the clock oscillates between 1392 and 1354.
NBody is an impressive abuser of the GPU! :)
The power reported by nvidia-smi seems to momentarily hit 192W (103%) but most of the time is clamped to its 185W (100%) cap.
The green regions in the “Perfcap Reason” graph indicate when the GTX 980 is power limited:
After running for 10 minutes the GPU clock still swings between 1354 and 1392.
The MEM clock and temps remain steady at 1752 and 71C.
IMHO, the “benchmark” switch is a poor choice for determining a GPU’s max sustained flops and power as 256K bodies is still too short of a duration.
Let me know if this is what you’re looking for!
A quick inspection of the nbody code reveals the best way to invoke the app without resorting to a looping script:
nbody -benchmark -i <loops>
… where is a really large number.
Later… that pushes the GTX 980 to a very steady 99.x% TDP (183-185W), 1354-1366 and 73C.
I don’t want to see any magic smoke coming out of my PC so I’ve turned it off and am ingesting Thanksgiving Day turkey and pumpkin pie.