OpenCL multiple devices

Hi!

I have created a context with two devices on a quadro 600. When I use one context I get no latency in the clEnqueueNDRangeKernel call but with two queues it takes at ledast 10 times longer time in the actual call.

The measured profile timing of the events are still the same but why is the call that much slower?

I ment when I used one device in the context. The bad performance occurs when I use two devices in the same context.