here’s a little Cocoa app for 10.5 and 10.6 that links against the CUDA driver and monitors the clock rate of the first CUDA device, the one driving the monitor (always?).
The app also has a slider that (if set to != 0) will draw the selected amount of triangles per screen refresh. It’s quite interesting to see the clock of the 9400 and 9600 go up and down as the number of triangles is changed.
The app is pretty efficient: at startup it uploads vertex buffer object with random floats and uses this (once) uploaded buffer with varying buffer offsets to draw the triangles and their colors “randomly”. So there are only very few OpenGL commands issued per frame and NO memory transfer from host to device. Drawing is done using a display link (a thread synced with the display refresh rate). On a MBP it uses about 3% CPU on a MacPro less than 1%. Sources are included, use and modify at will.
Using the app, one can actually see what the 9600 GT power management does behind the scenes:
when connected to power, the minimal clock is 0,85 GHz
when running from battery, minimal clock is 0,35 GHz
the increase/decrease in clock rate takes about 3sec (move the slider of the app from 0 to 2000 triangles and you can see the increase kick in)
the increase can go from the slowest clock to the highest at once
the decrease cycles through intermediary rates
The app also shows that CUDA is playing well with Apple’s power management (at least on the GT 9600)
very short kernels will not increase the GPU clock rate (like drawing few rectangles!) so benchmarking these can be misleading
lengthy kernels DO increase the clock (just compare what histogram and simpleStreams do to the clock rate)
memory operations also DO increase the clock, again, only if they are large/frequent enough
when running at 0,85 GHz the transfer rate host->device and device->host run at full speed, but device->device not
when running at 0,35 GHz the transfer rates are about 1/2 - you can see that using bandwidthTest
I also found a BUG: when a external monitor is connected to the MBP (mid 2009) the 9600 will NOT run at full speed. Yes, that right (at least for me) - it will only go as high as 0,85 GHz, never to 1,25 GHz.
I also found a weirdness: when the MBP has the 9600 as main GPU, the 9400 keeps running at full speed (1,1 GHz). This is strange because when just the 9400 is running, it’s clock will go down as low as 0,3 GHz.
I’ve posted these issues to the Apple bug tracker - but please, do so also.
Thank you! Just great. This is a huge step in the right direction. Just tried it out on my lazy GTX 285 and it does the job very well, and seems to take up many fewer resources than the teapot (aka postprocessGL).
Just did a bit of playing around: on an 08 Pro 2.8GHz and ACD23in, in order to try and see the boundaries for various speeds.
188 triangles per sec (TPS) mostly keeps card in 0.8GHz mode (181 seems to drop to 0.6)
762 TPS keeps card in 1.48GHz mode (755 seems to drop to 0.8 and remain there)
The boundary does not appear to be completely sharp, with maybe occasional drop backs to the lower speed, even at 2000 TPS.
deviceQuery displays speeds consistent with the applet.
For V2 (just to be pushy)
If the dropdowns could be stopped and the window minimized that might be a nice evolution, but one can always put it behind another window. I have two cards up and it just seems to talk to the one driving my display?
EDIT: I guess the Snow Leopard CUDA bugs need to be fixed before this can work under 10.6. I just tried it and the app runs, but there is no clock speed displayed. Running OpenGL benchmarks sort of suggested the clock speed was not going up. I guess all this has to wait for the CUDA update, so is no relfection on this nice app.
interesting, so the GTX 285 has 3 clock rates: 0,6 0,8 and 1,48 GHz.
Can you confirm that lengthy CUDA kernels do change the power state of the card - is power management working correctly after all?
I tested this on a MacPro with 8800 GT: this GPU will always run at full speed, 1,5 GHz. So no power management here.
A few notes:
the number of triangles is per screen refresh, so the actual number of triangles drawn is # TRIANGLES * ~60.
OK, one possible cause for this is that if more draws are requested than the card can process in the display link’s time slot (1/60sec), the display link will skip frame draws. In this case the frame rate will keep varying and eventually that influences the clock rate too. There’s a bug in the display links actual frame rate code, so I had to implement my own. It will calculate the mean frame rate of the last 30 frames.
the app can be minimized, and it will now display the GPUs clock rate in the dock icon.
Yes, in CUDA 2.3 there’s no way to get the OpenGL context for a CUDA device (or the other way around) without resorting to string comparing the OGL renderer and CUDA device name - urgh. I don’t have any Mac with more than one GPU so I don’t think I’ll change that. Actually, now that I know CUDA kernels do wake the card, the app could be re-written using CUDA only, no OpenGL… but it does what I need as it is.
It’s working fine in 10.6 - I’m using it to compare memory transfer speeds at different clock rates between CUDA and OpenCL. Either your CUDA install is not OK (see a post about installing CUDA in 10.6) or you are booted in 64bit kernel mode, in which case the CUDA kext fails to load and no device is found.
About OpenCL: in my tests (9600 GT and 8800 GT) OpenCL is about 15-20% slower than CUDA from pinned memory transfers (in both directions). Device transfers are the same in both. That’s actually very encouraging for OpenCL so far… I was expecting it to be much worse :)
Yep, there are three states. The problem is tripping it into 1.48 mode with a pure computation. If you run nbody it goes to 1.48 after a few seconds. But MonteCarlomultiGPU stays in 0.6 mode. We need the fast computation without having to run 3D graphics.
OK, with my 10.6 install the CUDA kext had fallen out again so I reinstalled. Now your app loads and does display a clock rate, of 1.48.
However, there is something not quite truthful about this because the OpenGL extensions viewer test still takes several seconds to go to higher speed, so I am confused.
So I went to see if I can make deviceQuery and all the other stuff, but I am in the same mess as all the others except you and one or two others. My 10.6 install is still not working properly though. I have done the things in the other thread about adding -m32 here and there, but have no idea about how to change the symlinks from 4.2 to 4.0 - what exactly did you edit where to effect this?
Ok I have a GTX and Snow Leopard. Im am trying desparately to get this working on my Mac Pro 09. I am relatively new to Mac and CUDa so Im sorry if I am Raw. The issue I am having is after installing all the files , 2.3.1 drives, SDKtool kit and the others. when I run and build the Xcode file it give me an error. I even reinstalled the driver twice after reading that fixed the issue… nothing.
as you can see in the screen shot, one needed library “libcuda” is marked red in Xcode, meaning it was not found where it is supposed to be.
I have an idea what that might be: in the Finder, choose Go->Go to Folder from the menu and enter this: /usr/local/cuda
Once you have the finder there, take a look if the folder “lib” is readable or if it’s permissions prevent you from opening it (then it’s icon will have a red banner).
If the folder has permissions that prevent you from reading it’s contents - bingo - just fix that. (You can do it in the Finder hitting CMD-I or from the console).
I’ve had one install with this problem somedays ago.
Let’s just hope the new installer… supposed to be here already… will fix these issues.
Wow, thank you so much… you saved what little hair I had left from being pulled out. thank you for taking the time… Now to my next issue. I get the program to work but NO CUDA DEVICE FOUND and the GPU clock does nto change… :(