On GF GTX 260 my kernels fail to launch.

Hi, i’ve just changed my graphic card from the Geforce 8800 GTX to GF GTX260. All the tests from the CUDA SDK passed the test, but when i try to execute my own functions cuda error failed to launch appear. All the cuda memcopy, cuda malloc operations work fine, the problem appears only with the kernels. On the emulation mode also everything works fine. On 8800 GTX there was no problem at all. Did anyone had similar problem?

If someone is curious about the code, i execute:

// convert unsigned char table to floats 

dim3 dimBlock(BLOCK_CHARTOFLOAT_W, BLOCK_CHARTOFLOAT_H);	// this is 16x16

dim3 dimGrid(_w/(BLOCK_CHARTOFLOAT_W), _h/BLOCK_CHARTOFLOAT_H);		 // w and h are image width and height

CUDA_SAFE_CALL(convertUCharToFloat<<<dimGrid, dimBlock, 0, stream>>>((float4*)pfGlobalData,		   // converting unsigned chars to floats

	(unsigned int*)pbGlobalData, _w));

and the kernel is:

__global__ void convertUCharToFloat(float4 *dest, unsigned int *source, int iWidth)

{

	float4 destValue;

	unsigned int position = blockIdx.x*BLOCK_CHARTOFLOAT_W+threadIdx.x

		+ (blockIdx.y*BLOCK_CHARTOFLOAT_H + threadIdx.y)*(iWidth>>2);

	unsigned int iData = *(source + position);

	

	destValue.x = __uint2float_rd(iData&0x000000FF);

	destValue.y = __uint2float_rd((iData>>8)&0x000000FF);

	destValue.z = __uint2float_rd((iData>>16)&0x000000FF);

	destValue.w = __uint2float_rd((iData>>24)&0x000000FF);

	__syncthreads();

	dest[position] = destValue;

}

I’ll appreaciate your help.

Best Regards,

Jacek

Have you tried recompiling with the switch
-arch sm_13 ?
Example:
nvcc.exe -arch sm_13 -(other options)

Didn’t help :mellow:. The funniest thing is that the kernel success to launch at first time, but fails on the second launch. No metter which kernel do i use. Even when i execute two times the same kernel, on the second time execution fails.

BTW operating system is Windows XP 32bit

I’ve just noticed, that the cudaVideoDecode sample doesn’t work also. It hangs up whole computer.

Any clue?

Since it worked on the 8800 which has an earlier compute capability, see if you can recompile with either
-arch sm_10 or -arch sm_11
See if you have the nVidia driver with CUDA support installed.
Also, try running the kernel on just one data element, for example,
CUDA_SAFE_CALL(convertUCharToFloat<<<1, 1, 0, stream>>>