Hi All,
I am new to CUDA and I am porting some existing code from runtime API to driver API. I am running into a strange problem while launching kernels via cuLaunchGrid. The same kernel works with the following
[codebox]
dim3 block(16, 16);
dim3 grid(width/block.x, height/block.y);
kernel<<<block,grid>>>(parameters);
[/codebox]
yet it simply returns CUDA_ERROR_UNKNOWN with the (therotically) equavalent call in driver API:
[codebox]
dim3 block(16, 16);
dim3 grid(width/block.x, height/block.y);
cuFuncSetBlockShape(kernel, block.x, block.y, 1);
// …parameter passing via cuParamSet*
cuLaunchGrid(kernel, grid.x, grid.y);
[/codebox]
the error is returned on the next call (in my case is a cuCtxSynchronize() ). However, if I limit the call to one dimension, say make the call like below:
[codebox]
cuFuncSetBlockShape(kernel, block.x, 1, 1);
cuLaunchGrid(kernel, grid.x, 1);
[/codebox]
it will execute my kernel; yet the result will only cover block.x * grid.x of course. I've been stuck for two days and I am sure it must be something trivial. Anyone with any comment is greatly appreciated!