OpenCL Multiple Kernels - Fermi GPUs Can we do multiple kernels in OpenCL


I was wondering if concurrent kernels within OpenCL works on the GTX480.

I create multiple command queues and put kernels on different queues.
Is there an example case ? I made my own example in a similar fashion to the “concurrent kernel” example of the CUDA SDK.

I read out the values of the OpenCL events and there is no overlap between start and end times of events.
I don’t insert any synchronization in between enqueuNDRange calls.

I even tried with seperate buffers used by each kernel but still no luck

Should the OpenCL events interface show the overlap ?

I saw an older post but there doesn’t seem to be any consensus there.

Thank You