CUDA on 9400M - lowered Cinebench scores?

I was experimenting with CUDA on the 9400M on a Mini under Mac OS X and experienced a crash, where I had to restart. The demos ran fine generally.

Then I decided to benchmark my machine using Cinebench 10 and I got a score of around 2600. I had benchmarked this machine before and had an OpenGL score of 3600+. No matter if I uninstall CUDA, unload the kext, flush the system caches, Cinebench won’t go higher than 2700 any more.

I booted in to Windows XP on the same machine where I haven’t used CUDA and bizarrely, it still gets 3600+ score on Windows Cinebench 10.

I have another Mini where I haven’t installed CUDA and the Cinebench score is 3600+. I will install it tomorrow on that machine and check to see for certain if something is happening just by installing and using CUDA.

Other OpenGL benchmarks seem to be ok but perhaps there is something the Cinebench app is doing like shader calculations that are being slowed down. I don’t understand what could have happened for the score to drop so much.

Also, I notice that my 9400M is showing up as 0.3GHz, is this the same speed that others with the 9400M get? I’d seen some people reporting 0.4GHz on other Macs with a 9400M. The Nvidia tool in Windows shows up the speed as 350MHz.

Did you install CUDA 2.0 or 2.1?

First 2.0, ran some of the demos, then I installed 2.1 over it.

edit: to expand on it, I installed 2.0, ran a few of the graphics demos like the fluidGL one and the denoise one and then went into the Mandelbrot demo. I panned around a bit then zoomed right in until it got blocky. Then I tried increasing the detail using the d/D keys - I pressed those a few times. They didn’t seem to do anything but I noticed the interface had locked up. I restarted and tested some other demos. Then I installed 2.1 to get the smoke particles demo.

It’s hard to say if the demos are running slower but the fps value seem to be slower.

Ok, I installed the CUDA software on the other Mini and it hasn’t affected my scores at all. I’m actually getting as much as 4100 in Cinebench 10 but the other Mini is stuck at 2700 max. This is almost a 50% drop.

I’ve run the demos and I’m definitely getting faster scores on one than the other.

fluidsGL gets 25fps steady
nbody n=5000 gets 45fps - 1.1BIPS - 22GFlops
smokeParticles gets 12fps

On the slower one
fluidsGL gets about 15fps
nbody n=5000 gets 30-35fps
smokeParticles gets 8fps

I seem to have lost a massive amount of processing power somehow. It’s about 30-50% slower. Is there something I can do to diagnose what is causing this slowdown? I have the developer tools but I couldn’t see anything obvious to monitor in the OpenGL profiler.

I don’t think it’s a hardware problem given that the same hardware runs fine with Windows so it looks as though something is messed up on the Mac side software-wise. I’ll check to see what other kexts are installed to see if there’s maybe a 3rd party app that is affecting it.

So, now I’ve reinstalled the Mac system, booted into safe mode, tried another user account, reset the SMC, reset the PRAM, repaired permissions and this is still happening. Performance is 30% or so less on the Mac side than the Windows side and vs another Mac of the same kind, same OS, same software.

I don’t think it’s a hardware issue because the hardware seems to be working fine and since reinstalling the system didn’t work, it can’t be a kernel extension issue.

The only thing that I can think of that I’ve done differently on this machine is that under Windows, I installed the Nvidia system tools to check my GPU clock. I ran the Find Optimal test that was under the GPU performance tab and it blue-screened during the test.

I didn’t overclock it though - I am aware that this can cause issues and so I didn’t move any of the sliders. If this find optimal test actually affects the firmware, it should make this clear as that wasn’t my intention in using it. I wanted to see what the optimal clock speed was because I know Apple underclocks these chips. When I go back into the tab, it lists the clock rate as set at factory defaults 350MHz core clock and 800MHz shader clock (which is lower than the spec of the 9400M(G) - 450MHz, 1100MHz). There doesn’t seem to be anything changed so I don’t know what damage it may have done.

One game on the Windows side did come up with a message saying that the hardware seems to have changed. Plus, I have found console messages on the Mac side saying IGPU: family specific matching fails but I don’t know if that’s normal. It recognises the GPU device ok and the OpenGL device driver app monitors the GPU.

Is it possible that this find optimal test has somehow affected the Mac firmware on the GPU only? Maybe the EFI settings and not the Windows BIOS? If so, how do I go about fixing this issue or at least diagnosing if the firmware was affected in any way? I guess I could try adjusting the performance slider in the Nvidia tool to see if it sets things back the way they should be but I really didn’t want to change this stuff in case it makes the machine unbootable altogether.

Edit: it seems that Find Optimal is actually changing the clock speed of the GPU. I suppose it’s my fault for going past the agreements and clicking the button but there wasn’t a warning that the button would actually be modifying my clock speeds, I figured the Find Optimal button was going to check the model of my GPU and tell me what the optimal clock was from a database. Anyway, assuming that the clock was changed from factory defaults, why is it the same speed in Windows?

Also, I loaded up ATI tool and it shows 3 sliders. 2D is set at 150MHz, 3D min is 150MHz and 3D max is 450MHz. Can someone check if that’s what they are supposed to be? I don’t mind changing the clock again if I can just get the performance back to factory default. It seems odd that the Nvidia tool says the clock rates are at factory defaults and yet the Mac side is performing over 30% slower. This is around the same ratio between 450MHz and 350MHz.

Thing is, if I put the speed up by 30% so that the Mac side goes back to normal, this would surely mean the Windows side will go up 30% too. It would make perfect sense to me if the Find Optimal button had gone through various clock speeds and only managed to reach a lower setting before it crashed but only if both Mac and Windows were running at the same 30% slower speed. It makes no sense that one side is running 30% slower than the other on the same hardware settings especially when the Mac CUDA deviceDrv reports the same core clock.

Edit2: I see that someone has used the NVidia Control Panel to change the clock speed on a Macbook ok -

http://www.hardmac.com/news/2009/01/26/nvi…book-on-windows

They went all the way to 550MHz, 1200MHz. I don’t know how that affects the lifetime of the components though but if I’m experimenting a lot with CUDA, the extra speed boost would be good - plus 60-70% increase in games means the difference between playable and not. I ramp the fan up faster anyway but is there a safe clock speed for this GPU? I was thinking about using 450MHz, 1000MHz. Which one is more important for CUDA? CUDA 2.0 actually listed the shader clock in deviceDrv but 2.1 lists the core clock.

I also noticed something interesting in the ATI tool, which is the dynamic clock rate of the 9400M. It changes clock based on what it’s used for. When you turn on the 3D test view, the clock rate is reported as 450MHz. How would that be if the Nvidia tool lists the clock at 350MHz? It surely can’t dynamically go above what the Nvidia tool states. This dynamic switching apparently introduces some latency too. I don’t know if this is what’s happening but the Cinebench benchmark will stutter briefly at the start and then settle into smooth motion. The clock seems to ramp up in steps 150->350->450 or something like that. I wouldn’t put it on high performance all the time myself but it can be done. In the ATI tool, you’d probably just set the lower 2 sliders higher up.

After finding out about the dynamic clock rates, I decided to try running the CUDA deviceQuery while the GPU was rendering and it lists it as 1.1GHz instead of 0.3GHz. 0.3GHz wasn’t the 350MHz core clock shown in the ntune software, it’s the ramped down shader clock when the GPU isn’t being used for intense 3D stuff.

This is why there doesn’t seem to be a problem because this is the clock it’s supposed to be at when doing 3D stuff - 450MHz core, 1100MHz shader. But there’s something else slowing it down in the Mac system.

The ntune software can’t have done anything because there is an apply button, which I never pressed. It just ran through the test that sees what clock rate the GPU can handle.

Plus, as I say the Windows side is performing correctly and CUDA deviceQuery on the Mac side reports the correct shader clock so the GPU itself can’t be underclocked in firmware.

This is really bugging me now because there’s just no reason for this to be happening. If the kernel extensions are all back to normal and the caches rebuilt and the core/shader clocks are correct then why are the OpenGL and CUDA tests on the Mac side 30% slower? What is strange is that the OpenGL tests that don’t use shader computation seem to be fine.

This is why I’m thinking that the crash while using CUDA on the Mac side has done something. I just don’t know where to start looking for the solution.

Problem solved.

It turns out that it was neither running ntune nor the CUDA crash.

It was actually to do with an old firewire DVD burner I’d started using again. I was just trying everything I could think of to fix this slowdown and I almost convinced myself that it couldn’t possibly be the firewire burner but I unplugged it while the CUDA fluid demo was running and I couldn’t believe it when it shot up from 15fps to 26fps.

I ran Cinebench for about the 30th time in 2 days and sure enough, it was at 3900 where I had just benchmarked it at 2600 5 minutes earlier with the firewire drive plugged in.

Just for kicks, I ran the Quake 4 demo and had it sitting at 45-50fps. I then plugged the drive in and it immediately dropped to 25-30fps. I’m using a Sonnet firewire 800 to 400 adaptor to connect this drive so I don’t know if it’s the drive or the adaptor but this must be screwing with the motherboard somehow.

Glad I didn’t do something stupid to fix this like actually overclock my GPU. Guess I’ll be looking for another external DVD burner.

Thanks for that last post describing the final solution.
WHY that would be a problem is a complete mystery but at least it’s a great clue for others who may google with the same problem!

phew. I was trying to figure out if we were doing something bizarre on the driver side. mystery solved!

I hope so. After reinstalling the OS, doing as many kernel extension resets and even an SMC reset, it should save people a lot of time if they first unplug all the devices from the machine.

Yeah, it’s very odd, especially considering it didn’t affect the Windows side. I guess Mac OS X 10.5 must be polling devices in a different way. Thanks for your help.

I was also puzzled by the 0.3GHz reported on my Mac mini. Thanks to your information now I can get the correct 1.1GHz number too :)

However, I found a bug related to this in the profiler. In some cases the profile incorrectly uses the 0.3GHz number to compute the gputime information. For example, when I was testing my reduction kernel, which has a timing function, shows that it took about 1 million cycles. In the profiler log, it shows the gpu time as about 3300 microseconds. That puzzles me because it’s too slow for a dataset of 8MB. After reading this thread, I tried to test it again with fluidGL running in the background, and now the profiler log shows a correct gputime as about 880 ms.