Application does not scale when using cl::Buffer-Object

centershock · March 17, 2011, 4:07pm

Hello,

i have to following problem:

My application does not scale for multiple GPUs. It always is a bit slower on more GPUs than on less.

I could figure out, that a cl::Buffer-Object is causing this. I use the Buffer as follows:

First I create a usual array with malloc() which includes 20 elements (they are filled later):

int* pOverlap_region = (int*) malloc(80);

After it is filled I create the Buffer-Object:

cl::Buffer overlap_region = cl::Buffer::Buffer(
this->context.getOpenCLContext(), CL_MEM_COPY_HOST_PTR, 80, pOverlap_region, &err);

this->context.getOpenCLContext() returns the context.

Then it is set as an argument for the kernel:

err |= kernel.setArg(3, (cl::Buffer) overlap_region);

If this Buffer is created and not set as an argument, the application scales on multi-GPU.

Does anybody know why the behaviour is like this?

Thanks for your replies

philipjfry · March 18, 2011, 1:16pm

OpenCL does not make many promises about buffers that are shared by multiple devices, although this is discussed briefly in Appendix A.1 of the OpenCL 1.0 specification.

In general it should be valid to use it on multiple devices at the same time as long as it is not modified - modifying shared buffers requires explicit synchronization (see appendix) or the results is undefined.

Another question is what the NVIDIA implementation of OpenCL does in such circumstances. If the NVIDIA implementation somehow serializes kernel calls on different platforms that share resources, this would be an explanation.

The easiest way to find out would be to use the profiler (either the computeprof application or the low-level interface that writes simple text log or csv formatted files. Looking at the time stamps for invocation and GPU start and end time, you could easily find out what happens.

Topic		Replies	Views
OpenCL API of clCreateBuffer() does not work as expected in a abnormal case CUDA Programming and Performance	2	816	February 20, 2019
enqueueWriteBuffer for multiple devices CUDA Programming and Performance	0	12034	April 4, 2011
Best Practice for Memory Managment in OpenCL CUDA Programming and Performance	3	4861	May 14, 2011
[SOLVED] What causes my OpenCL kernel serialized when running on multiple GPUs? CUDA Programming and Performance kernel	1	899	August 8, 2020
troubles with CL/GL texture interoperability clCreateFromGLTextureXD fail but clCreateFromGLBuffer w CUDA Programming and Performance	3	1735	November 10, 2010
memory sharing in a multi-gpu environment CUDA Programming and Performance	7	6689	April 4, 2010
How does clCreateBuffer actually work? We don't supply a cl_device_id CUDA Programming and Performance	2	7203	December 20, 2009
Detecting buffer allocation failure? CUDA Programming and Performance	0	1417	July 22, 2010
How to handle CL_MEM_OBJECT_ALLOCATION_FAILURE errors if amount of useable memory is not known? CUDA Programming and Performance	8	15532	October 9, 2017
Low FPS and overlap of usage OpenGL	0	74	July 12, 2024

Application does not scale when using cl::Buffer-Object

Related topics