Why is a GTX680 even slower than a GTX480 when using CUDA?

I’ve tested several Apps in the GPU Computing SDK, such as the GrabCutNPP, radixSort, etc. Surprisingly I found the GTX680 is even slower than my old GTX480 (about 0.9x). Why could this happen? In contrast, the test on 3DMark11 reported that the GTX680 is 2x faster.

The installed driver is 301.10, with a CUDA Toolkit 4.26. My OS is Windows 7 SP1. I even compile the code using compute_30 and sm_30, but the result kept the same.

ps: I couldn’t find a developer version driver that supports GTX680.

NVIDIA geared it towards gaming instead of compute.
See The Official NVIDIA Forums | NVIDIA for a larger discussion

It remains to be seen if NVIDIA has pulled a bait and switch and we will all need to move to Teslas, or if, at some point, the 680’s big brother will show up (on the gaming side) and be faster than the 580.

Indeed true that Fermi architecture is more catered to CUDA and Compute, as stated perhaps the aforementioned GK110 (speculated) will be more along the lines of its GF100/GF110 predecessors.

-Hooks

There are some problems with GTX680 in GPGPU computing: http://parallelis.com/kepler-and-gtx-680-worst-than-expected-on-gpgpu/.

But I hope this is a driver issue and will be solved soon.

I really want to dig into some microbenchmarking to understand if this is a compiler problem due to the switch to static instruction scheduling, or something else. Unfortunately, all these gamers have snarfed up the supply of GTX 680s, and mine is now back ordered until the first week in May. :)

I too am very curious about how code tuned specifically for Kepler can perform better than people have been reporting in the forums, and if future revisions of CUDA 4.2 will provide better performance. But I too have been unable to get access to one yet.