Problem with thread indexes and emulation

bashflyng · April 26, 2008, 10:53pm

Hi all,

I’m experiencing some weird behaviour in emulation mode, which I have found out is due to using a switch statement with threadIdx.y as the expression…

If I emulate this kernel:

__global__ void kernel1( ) {

  fprintf(stderr,"tx=%d ty=%d\n",threadIdx.x,threadIdx.y);

}

I will get an output like this (launching a (1,1) grid and (8,2) block):

tx=0 ty=0

tx=1 ty=0

tx=2 ty=0

tx=3 ty=0

tx=4 ty=0

tx=5 ty=0

tx=6 ty=0

tx=7 ty=0

tx=0 ty=1

tx=1 ty=1

tx=2 ty=1

tx=3 ty=1

tx=4 ty=1

tx=5 ty=1

tx=6 ty=1

tx=7 ty=1

However, if I add a switch statement somewhere in the kernel like this:

__global__ void kernel2( ) {

  fprintf(stderr,"tx=%d ty=%d\n",threadIdx.x,threadIdx.y);

  switch (threadIdx.y) {

    case 1: ...

    case 2: ...

  }

}

Indexes get corrupted and I get this output:

tx=0 ty=0

tx=1 ty=0

tx=2 ty=0

tx=3 ty=0

tx=4 ty=0

tx=5 ty=0

tx=6 ty=0

tx=7 ty=0

tx=0 ty=1

tx=1 ty=0

tx=2 ty=0

tx=3 ty=0

tx=4 ty=0

tx=5 ty=0

tx=6 ty=0

tx=7 ty=0

Any explanation or something I’m missing? Or is this a bug?

Thanks.

kristleifur · April 26, 2008, 11:49pm

However, if I add a switch statement somewhere in the kernel like this:
__global__ void kernel2( ) {

 Â fprintf(stderr,"tx=%d ty=%d\n",threadIdx.x,threadIdx.y);

 Â switch (threadIdx.y) {

 Â  Â case 1: ...

 Â  Â case 2: ...

 Â }

}
[snapback]368903[/snapback]

The first thing that pops to mind is â€“ do you have break statements after each case in the switch?

bashflyng · April 26, 2008, 11:59pm

Yes. In fact, changing the switch to an equivalent ‘if / else if’ structure produces the same result. And I have double checked that there are no writes out of place in the code.

Austin · April 27, 2008, 1:58am

I cant get what you want to do with a “switch()”.But I think it should be like this :

switch (threadIdx.y) {
case 0: …
case 1: …
}

since the BlockDim.y=2;and threadIdx.y would be either 0 or 1.

bashflyng · April 27, 2008, 8:57am

What the switch does doesn’t matter much, take care that printf is before it, so results should be correct.

Code given was just a quick example, real code uses 8x8 blocks, and I need to switch in groups of 8 to perform some shared memory loads.

DenisR · April 27, 2008, 6:05pm

Try to get switch-es that work on full warps. That way you will not get divergent warps. If you work on blocks of 8 threads, it means that you have 4 divergent paths per warp.

Topic		Replies	Views
Problem with 2-dimensional thread blocks CUDA Programming and Performance	6	3541	March 20, 2012
Indexing problems threadIdx.x+blockidx.x+..... = 0 ?!?!?!?! CUDA Programming and Performance	2	1288	January 12, 2011
ThreadIdx.y is not working CUDA Programming and Performance	2	4263	February 26, 2010
Simple question on passing to the kernel CUDA Programming and Performance	15	3612	January 15, 2012
problem indexing in simpleGL CUDA Programming and Performance	2	4346	March 5, 2007
threadIdx.y question CUDA Programming and Performance	7	6720	September 21, 2015
Warps in emulation and device mode CUDA Programming and Performance	3	3232	July 22, 2009
Problem with Threadidx not being set (or always zero) CUDA Programming and Performance	6	8658	July 19, 2010
Questions about lane operation CUDA Programming and Performance	2	459	May 31, 2019
threads in a loop threads go missing CUDA Programming and Performance	13	8488	September 9, 2008

Problem with thread indexes and emulation

Related topics