Hi all,
I’m experiencing some weird behaviour in emulation mode, which I have found out is due to using a switch statement with threadIdx.y as the expression…
If I emulate this kernel:
__global__ void kernel1( ) {
fprintf(stderr,"tx=%d ty=%d\n",threadIdx.x,threadIdx.y);
}
I will get an output like this (launching a (1,1) grid and (8,2) block):
tx=0 ty=0
tx=1 ty=0
tx=2 ty=0
tx=3 ty=0
tx=4 ty=0
tx=5 ty=0
tx=6 ty=0
tx=7 ty=0
tx=0 ty=1
tx=1 ty=1
tx=2 ty=1
tx=3 ty=1
tx=4 ty=1
tx=5 ty=1
tx=6 ty=1
tx=7 ty=1
However, if I add a switch statement somewhere in the kernel like this:
__global__ void kernel2( ) {
fprintf(stderr,"tx=%d ty=%d\n",threadIdx.x,threadIdx.y);
switch (threadIdx.y) {
case 1: ...
case 2: ...
}
}
Indexes get corrupted and I get this output:
tx=0 ty=0
tx=1 ty=0
tx=2 ty=0
tx=3 ty=0
tx=4 ty=0
tx=5 ty=0
tx=6 ty=0
tx=7 ty=0
tx=0 ty=1
tx=1 ty=0
tx=2 ty=0
tx=3 ty=0
tx=4 ty=0
tx=5 ty=0
tx=6 ty=0
tx=7 ty=0
Any explanation or something I’m missing? Or is this a bug?
Thanks.
However, if I add a switch statement somewhere in the kernel like this:
__global__ void kernel2( ) {
 fprintf(stderr,"tx=%d ty=%d\n",threadIdx.x,threadIdx.y);
 switch (threadIdx.y) {
  case 1: ...
  case 2: ...
 }
}
[snapback]368903[/snapback]
The first thing that pops to mind is – do you have break statements after each case in the switch?
Yes. In fact, changing the switch to an equivalent ‘if / else if’ structure produces the same result. And I have double checked that there are no writes out of place in the code.
Austin
April 27, 2008, 1:58am
4
I cant get what you want to do with a “switch()”.But I think it should be like this :
switch (threadIdx.y) {
case 0: …
case 1: …
}
since the BlockDim.y=2;and threadIdx.y would be either 0 or 1.
What the switch does doesn’t matter much, take care that printf is before it, so results should be correct.
Code given was just a quick example, real code uses 8x8 blocks, and I need to switch in groups of 8 to perform some shared memory loads.
DenisR
April 27, 2008, 6:05pm
6
Try to get switch-es that work on full warps. That way you will not get divergent warps. If you work on blocks of 8 threads, it means that you have 4 divergent paths per warp.