strange problem in release and debug modes. please help!

Hi,

I have written a global function to execute 600 times using threads.

global void fooGlobal( … )
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if( idx < 601 )
{
// some calculation here
}
}

and call configuration is
int size = (600+32-1)/32; // size = 19
dim3 numBlocks(size,1,1);
dim3 numthreads(32,1,1);
fooGlobal<<<<numBlocks,numthreads>>>( … );

so the global should run 19*32 = 608 times

becuase of if condition in the global function, so the execution limited to 600 times.

in EmuRelease and EmuDebug it runs 600 times, but in release and debug it runs only 233 ( 7*32 = 234 ) times.

Why is it so?

May b, you should do cudaThreadSynchronize() after the kernel launch