I ran the “oclSimpleMultiGPU” example on Vista 64bit with two GeForce 280’s. The program runs fine, both devices have been used for the computation. However, the devices are never used simulataneously. I ran the program using the Visual OpenCL Profiler. The “GPU time width plot” shows the workloads for both devices. But I never ever managed to achieve a state where the devices have been busy at the same time (indicated by the bar plots for the devices, which do never overlap).
How can it be achieved that two devices are simultaneously computing a kernel? Have I overlooked a magic compiler flag, or a documentation about this for the current SDK version?