App for monitoring/changing GPU clock rate Very efficient - compared to Teapot :)

maolimu · September 5, 2009, 9:00pm

Hi,

here’s a little Cocoa app for 10.5 and 10.6 that links against the CUDA driver and monitors the clock rate of the first CUDA device, the one driving the monitor (always?).

The app also has a slider that (if set to != 0) will draw the selected amount of triangles per screen refresh. It’s quite interesting to see the clock of the 9400 and 9600 go up and down as the number of triangles is changed.

The app is pretty efficient: at startup it uploads vertex buffer object with random floats and uses this (once) uploaded buffer with varying buffer offsets to draw the triangles and their colors “randomly”. So there are only very few OpenGL commands issued per frame and NO memory transfer from host to device. Drawing is done using a display link (a thread synced with the display refresh rate). On a MBP it uses about 3% CPU on a MacPro less than 1%. Sources are included, use and modify at will.

Using the app, one can actually see what the 9600 GT power management does behind the scenes:

when connected to power, the minimal clock is 0,85 GHz
when running from battery, minimal clock is 0,35 GHz
the increase/decrease in clock rate takes about 3sec (move the slider of the app from 0 to 2000 triangles and you can see the increase kick in)
the increase can go from the slowest clock to the highest at once
the decrease cycles through intermediary rates

The app also shows that CUDA is playing well with Apple’s power management (at least on the GT 9600)

very short kernels will not increase the GPU clock rate (like drawing few rectangles!) so benchmarking these can be misleading
lengthy kernels DO increase the clock (just compare what histogram and simpleStreams do to the clock rate)
memory operations also DO increase the clock, again, only if they are large/frequent enough
when running at 0,85 GHz the transfer rate host->device and device->host run at full speed, but device->device not
when running at 0,35 GHz the transfer rates are about 1/2 - you can see that using bandwidthTest

I also found a BUG: when a external monitor is connected to the MBP (mid 2009) the 9600 will NOT run at full speed. Yes, that right (at least for me) - it will only go as high as 0,85 GHz, never to 1,25 GHz.

I also found a weirdness: when the MBP has the 9600 as main GPU, the 9400 keeps running at full speed (1,1 GHz). This is strange because when just the 9400 is running, it’s clock will go down as low as 0,3 GHz.

I’ve posted these issues to the Apple bug tracker - but please, do so also.

Mark

MacFan · September 6, 2009, 7:14am

Thank you! Just great. This is a huge step in the right direction. Just tried it out on my lazy GTX 285 and it does the job very well, and seems to take up many fewer resources than the teapot (aka postprocessGL).

Just did a bit of playing around: on an 08 Pro 2.8GHz and ACD23in, in order to try and see the boundaries for various speeds.

188 triangles per sec (TPS) mostly keeps card in 0.8GHz mode (181 seems to drop to 0.6)

762 TPS keeps card in 1.48GHz mode (755 seems to drop to 0.8 and remain there)

The boundary does not appear to be completely sharp, with maybe occasional drop backs to the lower speed, even at 2000 TPS.

deviceQuery displays speeds consistent with the applet.

For V2 (just to be pushy)
If the dropdowns could be stopped and the window minimized that might be a nice evolution, but one can always put it behind another window. I have two cards up and it just seems to talk to the one driving my display?

Mark - you are a star for making this. Thanks again.
External Image

EDIT: I guess the Snow Leopard CUDA bugs need to be fixed before this can work under 10.6. I just tried it and the app runs, but there is no clock speed displayed. Running OpenGL benchmarks sort of suggested the clock speed was not going up. I guess all this has to wait for the CUDA update, so is no relfection on this nice app.

maolimu · September 6, 2009, 12:59pm

Hi MacFan

interesting, so the GTX 285 has 3 clock rates: 0,6 0,8 and 1,48 GHz.

Can you confirm that lengthy CUDA kernels do change the power state of the card - is power management working correctly after all?

I tested this on a MacPro with 8800 GT: this GPU will always run at full speed, 1,5 GHz. So no power management here.

A few notes:

the number of triangles is per screen refresh, so the actual number of triangles drawn is # TRIANGLES * ~60.

OK, one possible cause for this is that if more draws are requested than the card can process in the display link’s time slot (1/60sec), the display link will skip frame draws. In this case the frame rate will keep varying and eventually that influences the clock rate too. There’s a bug in the display links actual frame rate code, so I had to implement my own. It will calculate the mean frame rate of the last 30 frames.

the app can be minimized, and it will now display the GPUs clock rate in the dock icon.

Yes, in CUDA 2.3 there’s no way to get the OpenGL context for a CUDA device (or the other way around) without resorting to string comparing the OGL renderer and CUDA device name - urgh. I don’t have any Mac with more than one GPU so I don’t think I’ll change that. Actually, now that I know CUDA kernels do wake the card, the app could be re-written using CUDA only, no OpenGL… but it does what I need as it is.

It’s working fine in 10.6 - I’m using it to compare memory transfer speeds at different clock rates between CUDA and OpenCL. Either your CUDA install is not OK (see a post about installing CUDA in 10.6) or you are booted in 64bit kernel mode, in which case the CUDA kext fails to load and no device is found.

About OpenCL: in my tests (9600 GT and 8800 GT) OpenCL is about 15-20% slower than CUDA from pinned memory transfers (in both directions). Device transfers are the same in both. That’s actually very encouraging for OpenCL so far… I was expecting it to be much worse :)

Mark

MacFan · September 7, 2009, 11:49am

Yep, there are three states. The problem is tripping it into 1.48 mode with a pure computation. If you run nbody it goes to 1.48 after a few seconds. But MonteCarlomultiGPU stays in 0.6 mode. We need the fast computation without having to run 3D graphics.

Will check my 10.6 install!

Thanks

MacFan · September 7, 2009, 4:11pm

OK, with my 10.6 install the CUDA kext had fallen out again so I reinstalled. Now your app loads and does display a clock rate, of 1.48.

However, there is something not quite truthful about this because the OpenGL extensions viewer test still takes several seconds to go to higher speed, so I am confused.

So I went to see if I can make deviceQuery and all the other stuff, but I am in the same mess as all the others except you and one or two others. My 10.6 install is still not working properly though. I have done the things in the other thread about adding -m32 here and there, but have no idea about how to change the symlinks from 4.2 to 4.0 - what exactly did you edit where to effect this?

maolimu · September 7, 2009, 4:53pm

Take a look here http://forums.nvidia.com/index.php?showtopic=105940

It worked for me on 3 Macs.

MacFan · September 7, 2009, 6:42pm

Indeed - sorry - I missed the sudo instructions at the top of the thread! DOH!

zepharus1 · September 10, 2009, 2:29pm

Ok I have a GTX and Snow Leopard. Im am trying desparately to get this working on my Mac Pro 09. I am relatively new to Mac and CUDa so Im sorry if I am Raw. The issue I am having is after installing all the files , 2.3.1 drives, SDKtool kit and the others. when I run and build the Xcode file it give me an error. I even reinstalled the driver twice after reading that fixed the issue… nothing.

External Media External Media

How do I fix this so i can get my GTX 285 working correctly in 3D games

maolimu · September 10, 2009, 2:46pm

hi zepharus1,

as you can see in the screen shot, one needed library “libcuda” is marked red in Xcode, meaning it was not found where it is supposed to be.

I have an idea what that might be: in the Finder, choose Go->Go to Folder from the menu and enter this: /usr/local/cuda
Once you have the finder there, take a look if the folder “lib” is readable or if it’s permissions prevent you from opening it (then it’s icon will have a red banner).

If the folder has permissions that prevent you from reading it’s contents - bingo - just fix that. (You can do it in the Finder hitting CMD-I or from the console).

I’ve had one install with this problem somedays ago.

Let’s just hope the new installer… supposed to be here already… will fix these issues.

zepharus1 · September 10, 2009, 5:36pm

Wow, thank you so much… you saved what little hair I had left from being pulled out. thank you for taking the time… Now to my next issue. I get the program to work but NO CUDA DEVICE FOUND and the GPU clock does nto change… :(

External Media

maolimu · September 10, 2009, 5:46pm

Hum, so libcuda is there, but shows no device.

Are you using a 64bit kernel?

That would prevent the CUDA kernel extension from being loaded, which would get you no device found. If that’s the case, all CUDA SDK samples would also not run.

zepharus1 · September 10, 2009, 5:52pm

I do not believe that I am using the 64 bit kernel…I did nothing to enable it anyway. Thoughts?

maolimu · September 10, 2009, 8:12pm

On some MacPro’s 64K is the default kernel. Unless you change that CUDA will not work (until they release 64 bit kernel extensions).

Look at About this Mac, More Info then Software or open Activity Monitor and and see if the process kernel_task is ‘Intel (64 bit)’.

Or, take a look at this nice app: http://www.ahatfullofsky.comuv.com/English…ms/SMS/SMS.html

maolimu · September 11, 2009, 8:56pm

2.3a will now always report the GPUs max clock rate, not the actual.

You can still see that the GPU clock goes up by the speed with which the triangles are drawn, but the clock rate indicator is now useless.

Looks like we are not supposed to see the inner workings of the drivers. Too bad.

tmurray · September 11, 2009, 9:42pm

this is an oversight because the clock rate returned by the device property is the clock rate that will actually be used by a CUDA app on every other platform.

so yeah argh I am investigating whether a 2.3b is feasible at this point

maolimu · September 11, 2009, 9:46pm

Thx.

I should have asked before assuming anything:)

MacFan · September 12, 2009, 9:43am

Thanks from me too, as we really do need to see the actual clock speed to get a grip on how fast our code is running and whether we are really running at full speed.

cgbeige · September 21, 2009, 8:11pm

Hi - I’m trying to download this but the link just opens a new window and nothing downloads.

maolimu · September 21, 2009, 9:42pm

Well, I tried to upload it again, but got “Upload failed. Please ask the administrator to check the settings and permissions”

If you PM me an email I can send you the sources.

mark

mgstauffer · October 7, 2009, 3:20am

Any news on “2.3b” that will report actual clock rate?

Thx

Topic		Replies	Views
My GPU Became Slower... after 1 month of not testing cuda CUDA Programming and Performance	18	12317	August 23, 2010
CUDA on 9400M - lowered Cinebench scores? CUDA Programming and Performance	10	21664	April 8, 2009
Speed difference for same CUDA code under Windows/Linux CUDA Programming and Performance	24	46208	March 17, 2010
New 285gtx issue - no CUDA-capable device is available message: There is no device supporting CUDA. CUDA Programming and Performance	13	13073	June 19, 2009
deviceQuery always reports highest clock rate CUDA Programming and Performance	5	1857	July 13, 2010
GeForce GTX 460 & CUDA 3.1 (What is deviceQuery reporting?) CUDA Programming and Performance	8	10946	August 15, 2010
clockrate changes every time the program is run CUDA Programming and Performance	7	2448	March 9, 2009
CUDA activity monitor app CUDA Programming and Performance	0	2546	October 25, 2009
Problems running CUDA on non-primary display CUDA Programming and Performance	23	54677	June 27, 2008
One weird trick to get a Maxwell v2 GPU to reach its max memory clock ! CUDA Programming and Performance	59	18441	April 22, 2016

App for monitoring/changing GPU clock rate Very efficient - compared to Teapot :)

Related topics