Scalability on NVIDIA platforms


I have a problem concerning scalability on NVIDIA platforms using OpenCL.

In the OpenCL-forum by KHRONOS, I was told, that nothing is wrong from the OpenCL-spec view. [post=‘OpenCL KHRONOS Forum’][/post]

In a multi-GPU environment, I create a commandQueue for each device. My expectation is that the execution of those commandQueue is more or less interleaved. However, on NVIDIA, their execution is sequential. Are there known issues? Or is it my mistake?

I tested on a Tesla S 1070 System, as well as 2 x GeForce 8800 GTX, and 2 x GeForce 9800 GX2. None of this system supports scaling.

If you need further details, I will provide them!

Thanks for your help!

May be this helps:

I.e., you should also create multiple host threads.

Hey eyebex,

thanks for the quick reply.

However, creation of multiple host threads is not an option for me. With the described problem, I extend a framework and fitting it to multiple host threads would be too complicated.