Hello,
i have to following problem:
My application does not scale for multiple GPUs. It always is a bit slower on more GPUs than on less.
I could figure out, that a cl::Buffer-Object is causing this. I use the Buffer as follows:
First I create a usual array with malloc() which includes 20 elements (they are filled later):
int* pOverlap_region = (int*) malloc(80);
After it is filled I create the Buffer-Object:
cl::Buffer overlap_region = cl::Buffer::Buffer(
this->context.getOpenCLContext(), CL_MEM_COPY_HOST_PTR, 80, pOverlap_region, &err);
this->context.getOpenCLContext() returns the context.
Then it is set as an argument for the kernel:
err |= kernel.setArg(3, (cl::Buffer) overlap_region);
If this Buffer is created and not set as an argument, the application scales on multi-GPU.
Does anybody know why the behaviour is like this?
Thanks for your replies