I’m seeing weird behavior. In a test app, enqueuing kernels and memory reads are blocking e.g. return only when they complete operation on GPU if I have 2 1080 installed. If I remove one, they are non blocking and gpu executes them as it should during clFinish.
I profile them with 5.2 nsight and by dumping execution times of kernel schedules and clFinish.
I create context only with one card, and use only one queue.
Is this how it suppose to work? Is there a way to make 2 cards have non blocking opencl operations?