Dual 1080 makes OpenCL calls blocking


I’m seeing weird behavior. In a test app, enqueuing kernels and memory reads are blocking e.g. return only when they complete operation on GPU if I have 2 1080 installed. If I remove one, they are non blocking and gpu executes them as it should during clFinish.

I profile them with 5.2 nsight and by dumping execution times of kernel schedules and clFinish.
I create context only with one card, and use only one queue.

Is this how it suppose to work? Is there a way to make 2 cards have non blocking opencl operations?

It turns out that it has nothing to do with dual or single 1080. It just blocks at clEnqueueReadBuffer like if I would use CL_TRUE as 3rd parameter (but I don’t).