hello,
for thread execution checking the manual provides code (1) like
__global__ myKernel(int* result)
{
int tid = threadIdx.x;
if (tid < SIZE)
{
result[tid] = tid;
}
}
what about code (2) like
__global__ myKernel(int* result)
{
int tid = threadIdx.x;
if (tid >= SIZE) return;
result[tid] = tid;
}
is code (2) valid?
if yes, is code (2) more efficient?(*)
greetings,
moik
edit(*): in terms of branching and stack operations, of course not in terms of fork-join runtime!