Valid Block and Thread configurations

Hi nvidia,

i was testing different configuration on cuda before using one. It is confirmed from my program’s output that threads should be less than equal to 512 but there are some configurations which looks valid but are not working.

for example

Config: By=1, Bx=1, Ty=512, Tx=1 : Not Valid
Config: By=1, Bx=1, Ty=32, Tx=1 : Not Valid

Config: By=1, Bx=1, Ty=1, Tx=1 : Not Valid

Config: By=1, Bx=1, Ty=1, Tx=2 : Blocks=1 Threads=2 Valid
Config: By=1, Bx=1, Ty=1, Tx=4 : Blocks=1 Threads=4 Valid
Config: By=1, Bx=1, Ty=1, Tx=8 : Blocks=1 Threads=8 Valid
Config: By=1, Bx=1, Ty=1, Tx=16 : Blocks=1 Threads=16 Valid
Config: By=1, Bx=1, Ty=1, Tx=32 : Blocks=1 Threads=32 Valid
Config: By=1, Bx=1, Ty=1, Tx=64 : Blocks=1 Threads=64 Valid
Config: By=1, Bx=1, Ty=1, Tx=128 : Blocks=1 Threads=128 Valid
Config: By=1, Bx=1, Ty=1, Tx=256 : Blocks=1 Threads=256 Valid
Config: By=1, Bx=1, Ty=1, Tx=512 : Blocks=1 Threads=512 Valid

(where Tz = 1 in all cases)

Either my test program is giving incorrect results or there is any other technical reason for that. (my test program file is attached)
kindly help me with your comments.

thanks
configTest.txt (4.3 KB)

There is a problem with your logic in the kernel:

if (tx == 0)

	{

  data[0] = 30;	// just a sample number

	}

	else if ( tx == TX-1 && ty == TY-1 && bx == BX-1 && by == BY-1)

	{

  data[1] = 27;

	}

The “else if” branch is never taken if TX is 1, since the “if” branch is true for the first and only thread. So 27 is not written to data[1], and the host code flags this as an error. Change the “else if” to “if”, and you’ll get correct results.

Also, just a general point: You don’t need __syncthreads() at the end of a kernel.

yeah i got it right now… Thanks for the help.