I wrote code on 8800 Ultra and got very possitive result. Then, I use a GTX 295 to replace my old one but find: single GPU on GTX 295 is slower than 8800 Ultra.
8800 Ultra: 26.9s
GTX 295: 45.78s
It seems impossible!!!
I did a profilling to investigate this and find cudaThreadSynchronize occupied 43.71% duration. Anyone has idea on this?
It caused by vista. I don’t want to say anything to my lovely vista
might be doing doubles math with the 295 by mistake, that is very slow currently (relative to float operations that is)
Pretty impossible to tell with so few details. Are you using some of the SDK example projects to measure speeds?
8800 has 128 SPs at 1500MHz, 103GB/sec memory bandwidth
295 has 240 SPs at 1242MHz, 111GB/sec memory bandwidth. (For one half of the board)
If you have a very poorly designed kernel that’s not using enough blocks, you MIGHT find some case where the 8800’s clock rate of 1500 would give it a 20% speed boost over the 295.
But your timing shows something much worse.
If you just swapped the cards and did nothing else, perhaps your drivers are stale and don’t know how to deal with the 295 properly?