In fact I have a problem to translate some CUDA code to OpenCL :
int tasksCount = 489984;
int desiredWarps = (tasksCount + 31) / 32;
Vec2i blockSize(32, 4);
int blockWarps = (blockSize.x * blockSize.y + 31) / 32;
Vec2i gridSize((desiredWarps + blockWarps - 1) / blockWarps, 1);
return module->launchKernelTimed(kernel, blockSize, gridSize);
Hi
I have the following code in CUDA :
//int tasksCount = 489984;
int blockWidth = 32; // One warp per row.
int blockHeight = 4; // 4*32 = 128 threads, optimal for GTX480
localWork = cl::NDRange(blockWidth, blockHeight);
int globalWidth = tasksCount / blockHeight;
if ( globalWidth % blockWidth != 0 )
globalWidth = (globalWidth / blockWidth + 1) * blockWidth;
globalWork = cl::NDRange(globalWidth, blockHeight);
On the kernel levels I use this :
rayidx = threadIdx.x + blockDim.x * (threadIdx.y + blockDim.y * (blockIdx.x + gridDim.x * blockIdx.y));
into
rayidx = get_global_id(1) * get_global_size(0) + get_global_id(0);
But honnestly I’m not convince that it is correct. Do you have an advice ?
Thanks