Kernel error on release mode

I use Visual Studio 2010 on win 7

I’ve a CUDA error only on release mode but not in debug mode.
The error is after a call kernel, catch with cudaGetLastError.
The error there is even with kernel function empty and with parameters list function empty.

I don’t need debug because my graphic card support Cuda capability 1.1

The error might be because of a race condition or an out of bound memory access. In case you should run the the debugger and profiler.

This is my kernel…

__global__ void functFilter(float *imgIn, float *imgPut, int height, int width, int pitch){

And this us mg call

int thBlock_x = 32;
	int thBlock_y = 16;
	int numBlocks_x = (this->_nxsIn+padding)/thBlock_x;
	int numBlocks_y = (this->_nxsIn+padding)/thBlock_y;

	dim3 dimGrid(numBlocks_x,numBlocks_y);
	dim3 dimBlocks(thBlock_x,thBlock_y);
	//for(int k=1; k<=this->_nxsIn; k++)


	e = cudaGetLastError();
		printf("kernel error\n");

Did you run the cuda-memcheck ?

No, don’t run it. I think that is not compatible with 1.1 capability

I am not aware of any limitations for the cuda-memcheck or the debugger. Compile your code with debugging flags -g -G and then run the program. You should check the documentation at All compute capability are supported and I hope you are using cudatoolkit 4.x or 5.x.

I’ve NVIDIA GeForce 8400M GT GPU.
I’ve a version of driver not compatible with nsight visual studio, thus cuda memory check is not available.
These drivers were provided to me by nvidia and is the only version compatible with visual studio 2010 for 8400M GT.

The only information that I can detect is the return code: 8 (0x8)