Question about clSetKernelArg

When passing an array of size n to function clSetKernelArg(), where n = m.p (p is number of kernels in the system, and m is an integer greater than 1). How are data distributed to kernels? Is there a way to assign the first m data to the first kernel; the second m data to the second kernel; …?

Thank you for your helps,

The sample code of “oclMatrixMul” in Nvidia sample codes is what I should learn to achieve the above question.

Thank you,

I am writing an OddEvenTransposition program using OpenCL. In my program, I have an array of size n=16000 and I use the maximum kernels = 32 and would like to partition each kernel 500 consecutive elements of data. Iwill need to do a lot of comparison swaping memory slot. Even though each kernel has a small chunk of data but each kernel should be able to access part of its neighbors’ chunks.

Originally, I thought I just need to use the following loops to distribute the data set:

cl_mem GPUVector[MAXPROCESSES];
for(unsigned int i = 0; i < MAXPROCESSES; i++){
GPUVector[i] = clCreateBuffer(GPUContext, CL_MEM_READ_ONLY,
sizeof(int) * CHUNKSIZE, HostVector+(i*CHUNKSIZE), &err_num);
if(err_num != CL_SUCCESS)
printf(“Error in create buffer %d”,err_num);
}

for(unsigned int i = 0; i < MAXPROCESSES; i++){
err_num = clEnqueueWriteBuffer(GPUCommandQueue, GPUVector[i], CL_FALSE, 0,
sizeof(cl_int) * CHUNKSIZE, HostVector+(i*CHUNKSIZE), 0, NULL, NULL);
if(err_num != CL_SUCCESS)
printf(“Error in write to GPUbuffers - %d”,err_num);
}

However, doing this I got error CL_INVALID_KERNEL when execute the kernel. Could you please give me a lift.
Thank you,