Single vs. Multiple contexts with multiple GPUs

Hi,

I’ve been running some tests running our algorithm across two (or more) GPUs. This uses 2 Tesla C1060 boards and OpenCL 1.1 beta. I first distributed the job using 2 devices, but 1 context. Kernels were launched in two different threads. My performance looks like this (Centos 5.4, Driver 258.19):

However, when I created 2 contexts in 2 threads and ran the same code, my performance looks like this:

I looked through the OpenCL 1.1 spec, but I couldn’t find any suggestions as to how this code should be written. Anyone have experiences they would like to share? Clearly, the multiple context route seems to be best.

Cheers,
-dan

Dan,
Nice dataset. I have always been doing 1 context per GPU, and it is good to see someone has tried both ways. The couple times that I have mentioned that I did it this way, I seemed to get push back that this was un-neccessary. When I first wrote this part of my system in 8/09, I thought it would give early drivers no choice, but to do as I asked. The labor to do either setup is almost identical, so that should not be a deciding factor.

Until your performance data was published, the only “advantage” I could actually find with 1 context per GPU (though I have yet to try this) is you can utilize multiple platforms at the exact same time, working on the same problem, eg. ATi & nVidia. Kind of a parlor trick rather than a real advantage though.

You might not get much feedback, but this could be because you ARE the feedback.

Thanks,

Jeff

Dan,
Nice dataset. I have always been doing 1 context per GPU, and it is good to see someone has tried both ways. The couple times that I have mentioned that I did it this way, I seemed to get push back that this was un-neccessary. When I first wrote this part of my system in 8/09, I thought it would give early drivers no choice, but to do as I asked. The labor to do either setup is almost identical, so that should not be a deciding factor.

Until your performance data was published, the only “advantage” I could actually find with 1 context per GPU (though I have yet to try this) is you can utilize multiple platforms at the exact same time, working on the same problem, eg. ATi & nVidia. Kind of a parlor trick rather than a real advantage though.

You might not get much feedback, but this could be because you ARE the feedback.

Thanks,

Jeff

Just wanted to say thanks for this data point. I am about ready to start a multi-GPU design (9 GPUs) and was clueless as to how to go about it. This information is wonderful to have.