About histogram64Kernel() Function

I use CUDA 2.1, Windows Vista 32-bit OS and VS2005

When running the Example histogram64 in the CUDA SDK using the EmuDebug, there is something wrong.

#ifndef CUDA_NO_SM_11_ATOMIC_INTRINSICS

		cutilSafeCall( cudaMemset(d_Result64, 0, HISTOGRAM_SIZE) );

		histogram64Kernel<<<blockN, THREAD_N>>>(

			d_Result,

			(unsigned int *)d_Data,

			dataN / 4

		);

		cutilCheckMsg("histogram64Kernel() execution failed\n");

		cutilSafeCall( cudaMemcpy(h_Result, d_Result64, HISTOGRAM_SIZE, cudaMemcpyDeviceToHost) );

	#else

		histogram64Kernel<<<blockN, THREAD_N>>>(

			d_Result64,

			(unsigned int *)d_Data,

			dataN / 4

		);

		cutilCheckMsg("histogram64Kernel() execution failed\n");

It build successfully. When running to

#else

		histogram64Kernel<<<blockN, THREAD_N>>>(

			d_Result64,

			(unsigned int *)d_Data,

			dataN / 4

		);

It shows the message histogram64Kernel() execution failed.

How can i do???

Any reply to this question is appreciated.