How to use multiple devices concurrently? Profiler says that only one device is active at any time


I ran the “oclSimpleMultiGPU” example on Vista 64bit with two GeForce 280’s. The program runs fine, both devices have been used for the computation. However, the devices are never used simulataneously. I ran the program using the Visual OpenCL Profiler. The “GPU time width plot” shows the workloads for both devices. But I never ever managed to achieve a state where the devices have been busy at the same time (indicated by the bar plots for the devices, which do never overlap).

How can it be achieved that two devices are simultaneously computing a kernel? Have I overlooked a magic compiler flag, or a documentation about this for the current SDK version?


Are you using 3.0 Release SDK? I think I remember reading about multi-gpu not working correctly on Windows 7 and Vista (serialization) in one release note, but I don’t see this in 3.0 release notes. Perhaps it was in the beta version?

What drivers are you using?

Thank you for your response.

Sorry, I should have mentioned that: I’m using the latest Toolkit and SDK (3.0, final) and Driver (DevDriver for Vista 64, Version 197.13).

You’re right: In the 3.0 BETA release notes it was mentioned as one of the Known Issues:

So unfortunately this seems not to be resolved in the final 3.0 version…?

Is this still an issue?

I am seeing the same problem… See my post at

— UPDATE ----

To whom it may concern:

I was able to get two devices working together by creating multiple contexts, one for each device. This should not be necessary as I understand the OpenCL spec, and is verifiably not required on AMD but seems to work in practice on nVidia, except now I seem to run out of memory as there must be significant overhead with each context created…

Hope this helps someone out there.