what's wrong with this simple code

//solved, thanks to Ailleur, and i was muddleheaded then…

=============================

“launch time out failure” for below kernel. If i comment the 3 lines with “//CMT”, then it runs ok. Why? Thanks!

 dim3 Dg(1, 1, 1);

 Â dim3 Db(64, 1, 1);

//kernel

{

	__shared__ int s_top;

	__shared__ int s_offsetQ;

	__shared__ int s_cnt; //throughput of result of this block

	const int tx = threadIdx.x;

	if(tx == 0) //CMT

	{ Â //CMT

 Â s_cnt = 0;

 Â s_offsetQ = blockIdx.x;	//bx

	} //CMT

	while(s_offsetQ < nQ)

	{

...

 Â if(tx == 0)

 �  �  s_offsetQ += gridDim.x; //block number

 �  __syncthreads();

	}

}

I would guess that its because s_offsetQ doesnt have a value for every other thread and you still ask that it loops over it.
I would encapsulate the whole thing in if(tx==0).
At this point, only one thread per block will be active though.

thanks! i newly edited my orignial post by adding “…”. I have other lines there, but the problem emerges even those lines are commented.
you said " s_offsetQ doesnt have a value for every other thread ", since it’s shared , i can’t agree. Thanks!

Right you are, i missed the fact that it is shared. There should be a syncthread after the (first) assignment then.