Problems with Fermi and AsyncAPI

Hi,

I am not able to enable asynchronous transfer on Fermi card (GTX 480 and C2050).

The output for asyncAPI, is:

– OUTPUT ON GTX 480 FOR ASYNCAPI----

[asyncAPI]
CUDA device [GeForce GTX 480]
time spent executing by the GPU: 33.44
time spent by CPU in CUDA calls: 33.44
CPU executed 15 iterations while waiting for GPU to finish

[asyncAPI] -> Test Results:
PASSED

– OUTPUT ON TESLA C2050 FOR ASYNCAPI----

[asyncAPI]
CUDA device [Tesla C2050]
time spent executing by the GPU: 22.75
time spent by CPU in CUDA calls: 22.75
CPU executed 30 iterations while waiting for GPU to finish

[asyncAPI] -> Test Results:
PASSED

In both the cases the time spent by CPU in CUDA calls is the same as the time spent by GPU, which clearly shows that the call is not asynchronous.

Is there a flag that have to be set to enable asynchronous calls.

Thanks