Hi, i’ve just changed my graphic card from the Geforce 8800 GTX to GF GTX260. All the tests from the CUDA SDK passed the test, but when i try to execute my own functions cuda error failed to launch appear. All the cuda memcopy, cuda malloc operations work fine, the problem appears only with the kernels. On the emulation mode also everything works fine. On 8800 GTX there was no problem at all. Did anyone had similar problem?
If someone is curious about the code, i execute:
// convert unsigned char table to floats
dim3 dimBlock(BLOCK_CHARTOFLOAT_W, BLOCK_CHARTOFLOAT_H); // this is 16x16
dim3 dimGrid(_w/(BLOCK_CHARTOFLOAT_W), _h/BLOCK_CHARTOFLOAT_H); // w and h are image width and height
CUDA_SAFE_CALL(convertUCharToFloat<<<dimGrid, dimBlock, 0, stream>>>((float4*)pfGlobalData, // converting unsigned chars to floats
(unsigned int*)pbGlobalData, _w));
and the kernel is:
__global__ void convertUCharToFloat(float4 *dest, unsigned int *source, int iWidth)
{
float4 destValue;
unsigned int position = blockIdx.x*BLOCK_CHARTOFLOAT_W+threadIdx.x
+ (blockIdx.y*BLOCK_CHARTOFLOAT_H + threadIdx.y)*(iWidth>>2);
unsigned int iData = *(source + position);
destValue.x = __uint2float_rd(iData&0x000000FF);
destValue.y = __uint2float_rd((iData>>8)&0x000000FF);
destValue.z = __uint2float_rd((iData>>16)&0x000000FF);
destValue.w = __uint2float_rd((iData>>24)&0x000000FF);
__syncthreads();
dest[position] = destValue;
}
I’ll appreaciate your help.
Best Regards,
Jacek