I have been hacking at opencl for a couple of weeks, always on mobile GPUs, and was hoping for a major performance boost on the GTS250 I just bought. There was a performance boost, but some details left me wondering if it shouldn’t have been much bigger; I have the exact same kernel in ubuntu 10.04 32bit on a laptop with an 8400GM, and compiled it now on a ubuntu 10.04 64bit with the mentioned GTS250. I get a 10x improvement between the 2, and 3x compared to the same thing running on my MacBook Pro with a GT 330M.
But while CL_KERNEL_WORK_GROUP_SIZE is 384 on the two laptops, it drops to 192 in the desktop+GTS250. Same driver version. Is the specific card to blame for this? What am I missing?