Disable cores to benchmark performance

does anybody know whether I can disable cores in a CUDA run.

Problem is, I have a rather old 8600GT with 32 cores and a GTX280 with 240 cores.
My App runs 8.5 times faster on the second one.
Seems resonable for 7.5 times more cores and better coalesced read/writes.

BUT: I just found out, memory access is 7GB/s on the older one and 110GB/s on the new one.
Now I would like to know what part of the speed-up is done by the faster gloabal memory access and what by the increase of cores.

If I could restrict the GTX280 to use only 32 cores I could get a figure!

Any ideas how to do this… or to solve my problem otherwise?

Thanks and Cheers,

Change the memory and/or core frequencies. That is, if it works for you without crashing (for me it only works with versions before (I think) CUDA 2.0 beta 2).

Gracias for the tip, Reimar…

Any tool you would recommend for the job?



Uh. Just the applications nvidia has for that purpose? E.g. the control panel (well, the thing that replaced it nowadays) on Windows, nvidia-settings on Linux (maybe you still have to set the CoolBits option in the xorg.conf) or the non-NVidia nvclock, that one might work without an X server running, though I never tried.

The simplest (and only) way to disable cores on a GTX 280 is to run on GTX 260 ;) But then, that has lower memory bandwidth, too… And probably for a reason (see discussion below)

I hate to say it, but even if you could disable MPs, I doubt that you would get the information you want. The hardware is most certainly designed to distribute that 110 GiB/s of bandwidth among all the MPs. There is probably no way that a fewer number of MPs could sustain the same bandwidth.

A different (not necessarily better) way to test if you are compute or memory bound is to also count the number of GFLOP/s that your kernel does. If you get ~40 GFLOP/s and 110 GiB/s, then you are most certainly memory bound. But, if you were to get, say ~500 GFLOP/s and 1 GiB/s, then you are probably compute bound. With ~500 GFLOP/s and 110 GiB/s, you have the perfect mix for the GPU: great!

Anyways, just from seeing the 110 GiB/s on the GTX 280 I would guess with 99% certainty that your kernel is memory bound.

You miscalculated the 8600GT’s bandwidth by half (theoretical is >22GB/s). NVIDIA GPUs retain the same balance between comps and memory bandwidth across performance levels, which is why using a card like an 8600GT still lets you easily project performance to a higher-spec’d card, as you have discovered.

Dear Alex,

thanks for your helpful reply!

Actually I did not “calculate” the bandwidth but just started the “bandwidthTest.exe” from the CUDA SDK.

I am using a passively cooled card, maybe that might make the difference to the “normal” 8600GT specs?

Best regards,


Passively cooled cards shouldn’t have their memory underclocked so much. Strange. On my 8600GT, btw, I get 15.5 GB/s device-to-device.