Got the GTX 285 working on my Mac Pro (2009), but I was expecting better performance.
I have a simple kernel to add a large array of floating point numbers similar to the scan sample program. On the GT 120, it can add 1,000,000 floating point numbers in 2.84ms, on the GTX 285 is only took 2.40ms. That’s only about a 15% increase in performance. Going from 32 cores to 240, I was expecting a bigger boost. :-(
The program uses a single 512 thread block. Would more than one block improve performance?
Another observation, on the GT 120, the performance is much better (twice as fast) if the display is attached to the GT 120, but with the GTX it doesn’t seem to matter. I also notice that with both the GT 120 and the GTX 285 in the machine, the GT 120 never gets me more than 1.5GB memory copies to pinned memory. With just the GT 120 and the display attached I can get over 5.5GB.