I’ve been running some tests running our algorithm across two (or more) GPUs. This uses 2 Tesla C1060 boards and OpenCL 1.1 beta. I first distributed the job using 2 devices, but 1 context. Kernels were launched in two different threads. My performance looks like this (Centos 5.4, Driver 258.19):
I looked through the OpenCL 1.1 spec, but I couldn’t find any suggestions as to how this code should be written. Anyone have experiences they would like to share? Clearly, the multiple context route seems to be best.
Dan,
Nice dataset. I have always been doing 1 context per GPU, and it is good to see someone has tried both ways. The couple times that I have mentioned that I did it this way, I seemed to get push back that this was un-neccessary. When I first wrote this part of my system in 8/09, I thought it would give early drivers no choice, but to do as I asked. The labor to do either setup is almost identical, so that should not be a deciding factor.
Until your performance data was published, the only “advantage” I could actually find with 1 context per GPU (though I have yet to try this) is you can utilize multiple platforms at the exact same time, working on the same problem, eg. ATi & nVidia. Kind of a parlor trick rather than a real advantage though.
You might not get much feedback, but this could be because you ARE the feedback.
Dan,
Nice dataset. I have always been doing 1 context per GPU, and it is good to see someone has tried both ways. The couple times that I have mentioned that I did it this way, I seemed to get push back that this was un-neccessary. When I first wrote this part of my system in 8/09, I thought it would give early drivers no choice, but to do as I asked. The labor to do either setup is almost identical, so that should not be a deciding factor.
Until your performance data was published, the only “advantage” I could actually find with 1 context per GPU (though I have yet to try this) is you can utilize multiple platforms at the exact same time, working on the same problem, eg. ATi & nVidia. Kind of a parlor trick rather than a real advantage though.
You might not get much feedback, but this could be because you ARE the feedback.
Just wanted to say thanks for this data point. I am about ready to start a multi-GPU design (9 GPUs) and was clueless as to how to go about it. This information is wonderful to have.