Execute one ernel with different attributes at the same time

Hi all,

I’m developing currently a sample project for my company.

I have one multidimensional array and I want to sqare every member.

Every line of the array should be calculate by one kernel.

Of course every kernel has the same code but different arguments.

As example:

OpenCL-Thread'1' squares array[0][0...m]

OpenCL-Thread'2' squares array[1][0...m]




OpenCL-Thread'n' squares array[n-1][0...m]

How do I implement this in OpenCL?



Your question is rather general and I think you would be better of reading some tutorials and programming guides, they are usually more work put into those kinds of texts compared to replies in a forum like this. If you already have read such documents but still have questions, then you might want to rephase them into something more specific. Are you asking about memory access patterns, problem domain setup, kernel launching or something else?

The kernel launching is the problem.

I want to define a buffer for every line in the array.

But I don’t know how to launch the kernel instances.

Lets clarify some terminology. You say you have a multidimensional array and the pseduo-code snippet in your first post indicates that it is a two-dimensional array. Lets call it a matrix for short and to avoid ambiguity. Your latest post states that you want to create a separate buffer for every line in the array. This is most often a bad idea. Instead create a large buffer holding the entire matrix and use addressing expressions to find a specific row and column. The below kernel is an example of how one might let each thread of a kernel launch walk across a matrix row. Not tested so assume errors somewhere, but the idea should be correct.

__kernel void processByLine(float* data, uint width, uint height) {

  int myRow = get_global_id(0);

  if(myRow >= height)


for(uint i=0; i<width; ++i) {

    uint idx = i*height + myRow;

    data[idx] = processElement(data[idx]);



I don’t have the problem to write a kernel.

The problem is to launch it several times!

And how does that work?

int myRow = get_global_id(0);


PS: Good hint with the buffer.

Not sure I understand the problem. What happens when you launch it the second time?

Then the exitcode of clEnqueueNDRangeKernel tells me that something went wrong.

I now recognized that the kernel do not launch at all… I will tell you when it runs again…

I have now the code running again.

Everything is fine now and the OpenCL-Multithreading works fine.