bad performance for opencl on gtx580m

Hello,

I have an Alienware M17XR3 with GTX580M. I have set it up with Ubuntu Lucid 32 bit and 2.6.38 Kernel (PAE-backport from Natty). I tried the NIVIDIA 280.13, bumblebee for Lucid, Ironhide for Natty, but the performance of OpenCL on the NVidia-card is very bad. As a benchmark-program I used matrix-multiply.py from pyopencl. I get 13 GFLOPS, which is very slow. Have set up Windows 7 today and ran matrix-multiply.py there, which gives me 115 GFLOPS. Seems to me something is hitting the brakes on Linux. Does anyone have any idea what could cause this?

Thanks.