ConcurrentKernels SDK not working with Fermi

My system is installed with Ubuntu 9.10 64 bit OS and I am running latest CUDA 3.2 driver+CUDA+SDK. I must say that the graphics driver does not load correctly. It gives “(EE) No device found error” and then I go with safe graphics mode. With this setting, I am able to run non-graphics examples such as devicequery, matrixMul etc.

I tried running example concurrentKernels to test the Fermi feature of running multiple kernels concurrently. I profiled it through computeprof and as per GPU time width plot, I see all the kernels are executed sequentially. I have attached the pic for reference. Also, there is no way I can compare the timings with sequential execution as the numbers are computed mathematically. Not sure if the driver is causing the problem.

Please help.