What happens when I allocate memory on multiple devices?

I hope that I’m not asking something that is obvious, forgive me if I do and please add a pointer to some documentation that is clear about this if it is.

When allocating a buffer using clCreateBuffer it only takes the context, but not the device(s) where the memory should be allocated. The only real place where the buffer is one-to-one associated with a specific device is with the read/write commands, via the queue.

When is the buffer actually allocated, and where does it exist? (is it allocated on both devices if I have two, or only allocated once the read/write is called)

If it is allocated on both devices, is it automatically synchronized between devices, and if not, how do I synchronize it. Which raises another interesting question, what happens if I have two queues for a single device and I use the same buffer with both?

Hope that I’m not running in too many circles …



When using clCreateBuffer() you only provide the context. The buffer object is created within the context. Whether or not it is actually allocated on the devices within the context at that time is not defined by the specs and therefore implementation specific. On NVIDIA GPUs the actual memory to hold the buffer in device memory is not allocated until the device is specifically addressed to use the data. For read-only buffers, this would be when a clEnqueueWrite* command is issued to that device’s command-queue. For write-only buffers, this is even trickier. The actual allocation will take place on the first execution of a kernel, of which the buffer was set as an argument of, or at the first call to clEnqueueRead* command for that buffer on a command queue associated with the device.

OpenCL does not assume that data can be transferred directly between devices within the same context, so such a behavour is implementation specific. Technically, you need to explicitly transfer the data from one device to the other, by issuing a clEnqueueRead* command on the command queue attached with the 1st device, and then a synchronized clEnqueueWrite* command on the command queue of the 2nd device. This offcourse transfers data through the host. The same cl_mem object is used in both commands.

As for having two queues to a single device, and passing the same buffer on both queues, well, to me that’s just bad programming. Why would you want to do such a thing? I’d expect (have not been tempted to try it myself) the behavour in this case would be undefined.

Hope that helps,

Liad Weinberger.

Thanks, clears a few things up.

I’ve just seen that I’ve miss-phrased my last question. What I meant, is what happens if I have two queues for DIFFERENT devices on the same context and I enqueue the buffer on both. I guess that I can extrapolate the behavior from what you wrote, as different devices are not automatically synchronized. Although what happens if I use a mapped buffer? i.e a buffer that is supposed to be automatically synchronized between device and host?


Even with the rephrasing, I’m not sure why you’d want to do such a thing, unless you’re talking about mapping different regions for writing, in which case I can think of how it’d be useful. At any rate, read the documentation for clEnqueueMapBuffer(). Especially the notes section should prove useful.

Hope that helps.