Using multiple devices

Hi,

I’d would like to use the two devices I have access to in an efficient manner, what is the best way of doing ?

I’ve tried so far using two queues on the same context, then calling the same kernel with the same buffer, but managing the buffer indices inside the kernel. Let me try to be clear : 1 context, 1 kernel, 1 output buffer, but each device writes to different indices. This was not efficient (actually pretty slower than with a single device).

I’m guessing that since OpenCL allows for multiple devices on a single context, I shouldn’t need more, but I’d like confirmation of that. I’m also guessing that since buffers are located on a designated device memory, reads and writes of the other device make my solution pretty slow…

More generally my question is: should I use

    [*]the same kernel for each device, or multiple copies of the same kernel ?

    [*]the same buffers for each device, or different buffers, and if so, how can I force a buffer to be located on a specific device ?

    [*]one context for all devices, or one per device (of course this would mean different kernels and different buffers, but I’m not even sure there’s a way of doing that).

Hope I was clear, Thank you for any information.

Edit : I just noticed an extremely similar post just about a week old, sorry for the hasty post.