early thread cancelation

hello,

for thread execution checking the manual provides code (1) like

__global__ myKernel(int* result)

{

  int tid = threadIdx.x;

  if (tid < SIZE)

  {

	result[tid] = tid;

  }

}

what about code (2) like

__global__ myKernel(int* result)

{

  int tid = threadIdx.x;

  if (tid >= SIZE) return;

result[tid] = tid;

}

is code (2) valid?

if yes, is code (2) more efficient?(*)

greetings,

moik

edit(*): in terms of branching and stack operations, of course not in terms of fork-join runtime!

2 is valid, if it is more efficient is something you could check from the generated ptx code. There is no stack nor fork on GPU, and the branching is as far as I can see identical.

The second one is absolutely valid. They will probably be compiled into the same code. It is just a matter of coding style I think. I personally prefer the second one…