Been testing this GPU in linux, and WOW it is fast. Even when I use 64 bit integers it often outperforms the K20.
For example the GTX 780ti only takes 8.17 seconds to generate all 13! permutations of an array in local memory(no evaluations of permutations, just generates all with no repetition), when the K20 takes about 12 seconds for the same task.
For 14! it takes the GTX 780ti 126.37 seconds, while the K20 takes 188 seconds.
The funny thing is that other pieces of code which have mostly 32 bit float operations, the GTX 780ti is not always faster, and for some data shapes the GTX 780ti is slower by as much as 40% when compared to the K20.
Is there a fundamental difference in the way the GTX 780ti handles integer operations when compared to the K20 ?