OpenCL on Windows much slower than on Mac? A simple convolution test

I tried both 9400M and 9600M GT on OS X. The result is similar, except the 9600M GT only work in 32 bit (i386). On XP, only the 9600M GT can be used. You can’t switch to 9400M.

I mean is the output the same?

I mean is the output the same?

Yes, the calculation result is same.

Yes, the calculation result is same.

What is about other opencl programs? SDK examples? Most likely it is system issue.

What is about other opencl programs? SDK examples? Most likely it is system issue.

What is interesting it’s that the whole time difference is on the “last line”.

I suspect that this is Windows-kernel related, and that is not the last line itself, but probably the return of the information from the GPU.