I’ve got two NVIDIA GPUs:
- GeForce 8600M GT (4 multiprocessors, 0.75GHz clock rate)
- GeForce 8800 GT (14 multiprocessors, 1.5GHz clock rate)
With these specs I’d expect a kernel with enough thread blocks to run about 7 times faster (twice the clock rate, 3.5 times more multiprocessors) on the 8800 GT than on the 8600M GT.
Tests show me that it “only” runs about 5 times faster.
I’m taking the raw execution times of the kernel without any memcpy() operations.
Any guesses why?
Thanks a lot!