Anyone else notice a slight performance decrease with CUDA 6.0 ?

with the same hardware going from CUDA 5.5 to CUDA 6.0, I noticed a 5-7% decrease in some 32 bit floating point Gflops for the exact same operations on the exact same hardware. Even CUDA-Z is showing slightly lower numbers.

Not really that big a deal, but I think remember some others on this board saying the same thing so I a wondering if there are any other anecdotal reports of performance differences between the two versions.

This is for a system with a single GTX 780ti Windows 7.

It is a big deal. If you can demonstrate that the exact same code runs 5-7% slower on CUDA 6 vs. CUDA 5.5, I’m sure NVIDIA would like to know about it, especially if you can provide a short, simple, complete reproducer code. You can file a bug, or you can post it here if it’s simple enough. I’m sure someone would look at it.

If you’re willing to invest more time in it, I would also suggest testing with CUDA 6.5RC, in case something was rectified in the meantime.

I am hesitant to blame CUDA 6.0 because the GPU used is the EVGA’s SC version of the GTX 780ti, which has a varying boost clock.

What is weird is that using CUDA 6.0 I get about the same memory bandwidth (for a sum reduction).

But when I run some of my other bench marking code, like my permutation of 13-15 element int arrays I get very different running times.

When I just generate the permutations in local memory but do not evaluate the permutation the CUDA 6.0 version is faster by a great deal (6.3 seconds for 13! using CUDA 6.0, 6.9 seconds using CUDA 5.5).

When I run the version which generates all permutations and evaluates each permutation, then the CUDA 5.5 version is faster at 8.9 seconds, while the CUDA 6.0 version takes 9.3 seconds.

The matrixMulCUBLAS sample from the CUDA 6.0 SDK is slightly slower than the CUDA 5.5 version, as well as the CUDA-Z utility.

So overall there probably is not too much difference, but for some specific tasks it seems that the two CUDA versions do differ.