NVIDIA Developer Forums

problem launching kernel with cuLaunchGrid

Accelerated Computing CUDA CUDA Programming and Performance

xCatG July 15, 2009, 8:55am 1

Hi All,

I am new to CUDA and I am porting some existing code from runtime API to driver API.  I am running into a strange problem while launching kernels via cuLaunchGrid.  The same kernel works with  the following

[codebox]

dim3 block(16, 16);

dim3 grid(width/block.x, height/block.y);

kernel<<<block,grid>>>(parameters);

[/codebox]

yet it simply returns CUDA_ERROR_UNKNOWN with the (therotically) equavalent call in driver API:

[codebox]

dim3 block(16, 16);

dim3 grid(width/block.x, height/block.y);

cuFuncSetBlockShape(kernel, block.x, block.y, 1);

// …parameter passing via cuParamSet*

cuLaunchGrid(kernel, grid.x, grid.y);

[/codebox]

 the error is returned on the next call (in my case is a cuCtxSynchronize() ).  However, if I limit the call to one dimension, say make the call like below:

[codebox]

cuFuncSetBlockShape(kernel, block.x, 1, 1);

cuLaunchGrid(kernel, grid.x, 1);

[/codebox]

 it will execute my kernel; yet the result will only cover block.x * grid.x of course.  I've been stuck for two days and I am sure it must be something trivial.  Anyone with any comment is greatly appreciated!

xCatG July 15, 2009, 9:25am 2

below is my kernel, a simple float to byte buffer conversion…

[codebox]

global void Cvtkernel(int w, int h, float dFloat, unsigned char dByte)

{

int ix = blockDim.x * blockIdx.x + threadIdx.x;

int iy = blockDim.y * blockIdx.y + threadIdx.y;

if( ix < w && iy < h )

    dByte[ ix + iy * w] = (unsigned char) dFloat[ix + iy * w];

}

[/codebox]

commenting out the last line and cuLaunchGrid() did not complain; yet if I do anything, it will return ERROR_UNKNOWN; like changing the last line to

[codebox]

dByte[ix + iy * w] = 0;

[/codebox]

will bring down the whole thing. Am I doing index calculation wrong?

xCatG July 15, 2009, 10:21am 3

I found my problem, it was due to the call earlier when I used cuMemcpyHtoD(); I accidentally casted the host memory pointer to (void**) instead of (void*). Changing it to (void*) now my problems are gone.

Is there a debugger for device API on Windows?

Topic		Replies	Views	Activity
Problem launching kernel with driverapi CUDA Programming and Performance	1	1442	April 7, 2009
LaunchGrid issue. failure after successful LaunchGrid. CUDA Programming and Performance	5	2795	May 14, 2008
cutilCheckMsg("kernel launch failure"); unknown error. CUDA Programming and Performance	1	1352	October 27, 2010
Is this Correct? CUDA Programming and Performance	5	3139	May 21, 2009
CUDA grid launch failed error CUDA Programming and Performance	0	7284	March 11, 2011
Issue with a much larger grid than data CUDA Programming and Performance cuda , kernel	9	272	September 25, 2024
cudaLaunchKernel failed to launch kernel CUDA Programming and Performance cuda	2	1797	April 19, 2022
Grid dimensions CUDA Programming and Performance	6	5784	September 18, 2009
Kernels fail to launch after a certain blockDim.x CUDA Programming and Performance	2	998	January 6, 2012
Does Cuda driver API cuLaunchKernel has limit on gridDimX？The sample vectorAddDrv can't run when the N = 70000000 CUDA Programming and Performance	1	515	October 16, 2017