What happens when I allocate memory on multiple devices?

laughingrice · February 3, 2011, 11:31pm

I hope that I’m not asking something that is obvious, forgive me if I do and please add a pointer to some documentation that is clear about this if it is.

When allocating a buffer using clCreateBuffer it only takes the context, but not the device(s) where the memory should be allocated. The only real place where the buffer is one-to-one associated with a specific device is with the read/write commands, via the queue.

When is the buffer actually allocated, and where does it exist? (is it allocated on both devices if I have two, or only allocated once the read/write is called)

If it is allocated on both devices, is it automatically synchronized between devices, and if not, how do I synchronize it. Which raises another interesting question, what happens if I have two queues for a single device and I use the same buffer with both?

Hope that I’m not running in too many circles …

Thanks

weliad · February 11, 2011, 10:36pm

Hi,

When using clCreateBuffer() you only provide the context. The buffer object is created within the context. Whether or not it is actually allocated on the devices within the context at that time is not defined by the specs and therefore implementation specific. On NVIDIA GPUs the actual memory to hold the buffer in device memory is not allocated until the device is specifically addressed to use the data. For read-only buffers, this would be when a clEnqueueWrite* command is issued to that device’s command-queue. For write-only buffers, this is even trickier. The actual allocation will take place on the first execution of a kernel, of which the buffer was set as an argument of, or at the first call to clEnqueueRead* command for that buffer on a command queue associated with the device.

OpenCL does not assume that data can be transferred directly between devices within the same context, so such a behavour is implementation specific. Technically, you need to explicitly transfer the data from one device to the other, by issuing a clEnqueueRead* command on the command queue attached with the 1st device, and then a synchronized clEnqueueWrite* command on the command queue of the 2nd device. This offcourse transfers data through the host. The same cl_mem object is used in both commands.

As for having two queues to a single device, and passing the same buffer on both queues, well, to me that’s just bad programming. Why would you want to do such a thing? I’d expect (have not been tempted to try it myself) the behavour in this case would be undefined.

Hope that helps,

Liad Weinberger.

laughingrice · February 12, 2011, 3:03pm

Hi,

When using clCreateBuffer() you only provide the context. The buffer object is created within the context. Whether or not it is actually allocated on the devices within the context at that time is not defined by the specs and therefore implementation specific. On NVIDIA GPUs the actual memory to hold the buffer in device memory is not allocated until the device is specifically addressed to use the data. For read-only buffers, this would be when a clEnqueueWrite* command is issued to that device’s command-queue. For write-only buffers, this is even trickier. The actual allocation will take place on the first execution of a kernel, of which the buffer was set as an argument of, or at the first call to clEnqueueRead* command for that buffer on a command queue associated with the device.

OpenCL does not assume that data can be transferred directly between devices within the same context, so such a behavour is implementation specific. Technically, you need to explicitly transfer the data from one device to the other, by issuing a clEnqueueRead* command on the command queue attached with the 1st device, and then a synchronized clEnqueueWrite* command on the command queue of the 2nd device. This offcourse transfers data through the host. The same cl_mem object is used in both commands.

As for having two queues to a single device, and passing the same buffer on both queues, well, to me that’s just bad programming. Why would you want to do such a thing? I’d expect (have not been tempted to try it myself) the behavour in this case would be undefined.

Hope that helps,

Liad Weinberger.

Thanks, clears a few things up.

I’ve just seen that I’ve miss-phrased my last question. What I meant, is what happens if I have two queues for DIFFERENT devices on the same context and I enqueue the buffer on both. I guess that I can extrapolate the behavior from what you wrote, as different devices are not automatically synchronized. Although what happens if I use a mapped buffer? i.e a buffer that is supposed to be automatically synchronized between device and host?

weliad · February 12, 2011, 9:11pm

Hi,

Even with the rephrasing, I’m not sure why you’d want to do such a thing, unless you’re talking about mapping different regions for writing, in which case I can think of how it’d be useful. At any rate, read the documentation for clEnqueueMapBuffer(). Especially the notes section should prove useful.

Hope that helps.

Topic		Replies	Views
how are 'device' buffers actually allocated with multiple devices in a context clCreateBuffe CUDA Programming and Performance	9	5032	December 14, 2011
Transferring data between devices CUDA Programming and Performance	7	5397	August 10, 2011
How does clCreateBuffer actually work? We don't supply a cl_device_id CUDA Programming and Performance	2	7202	December 20, 2009
Confusion with clCreateBuffer CUDA Programming and Performance	5	27318	February 11, 2011
memory sharing in a multi-gpu environment CUDA Programming and Performance	7	6664	April 4, 2010
Best Practice for Memory Managment in OpenCL CUDA Programming and Performance	3	4846	May 14, 2011
Using multiple devices CUDA Programming and Performance	0	2620	January 19, 2010
clEnqueueWriteBuffer excessive memory usage CUDA Programming and Performance	3	2310	January 11, 2011
Running same kernel on multiple devices Spliting the same task on multiple devices CUDA Programming and Performance	6	6476	October 23, 2009
enqueueWriteBuffer for multiple devices CUDA Programming and Performance	0	12034	April 4, 2011

What happens when I allocate memory on multiple devices?

Related topics