Kill execution of a block

Hello,

I am new to CUDA.

I have a kernel that do atomic for each thread in a block. If the addition exceeds certain value, there is no point from running the remaining threads in a block.
So is there a way to kill a block on a certain condition to free the multiprocessor? :teehee::confused:
using return; will only kill the given thread right? :confused:

Thanks for advance.

To kill only a certain block - you can have a shared bool or something like that and all threads will be able to flag it (for exit)

and read whether they should continue or exit.

eyal

To kill only a certain block - you can have a shared bool or something like that and all threads will be able to flag it (for exit)

and read whether they should continue or exit.

eyal

I can’t see that any such thing would be needed. In your code, every thread would do a test anyway, afaik letting just one thread do a test will make the other threads in the block wait.

Further, atomic functions return the old value, so it would be easy to detect the stop condition without performing an additional read, and let all threads return through their own code.

e.g.

#define LIMIT 345.666f // or whatever

float value = 3.3f; // substitute you calculation

if (atomicAdd( &myglobalfloatvar, value ) + value >= LIMIT) return;

wil do just fine afaik(, assuming compute capability 2.0 for the floats, otherwise similar with int’s)

I can’t see that any such thing would be needed. In your code, every thread would do a test anyway, afaik letting just one thread do a test will make the other threads in the block wait.

Further, atomic functions return the old value, so it would be easy to detect the stop condition without performing an additional read, and let all threads return through their own code.

e.g.

#define LIMIT 345.666f // or whatever

float value = 3.3f; // substitute you calculation

if (atomicAdd( &myglobalfloatvar, value ) + value >= LIMIT) return;

wil do just fine afaik(, assuming compute capability 2.0 for the floats, otherwise similar with int’s)

Thanks eyalhir74, jan.heckman for your replies.

Actually the calculation is somewhat expensive that is why i want to cancel the rest of the threads.

I did something around

[codebox]shared bool stillrunning;

if (threadIdx.x == 0)

stillrunning = true;

__syncthreads();

if(!stillrunning)

return;

.

.

.

if (atomicAdd( &myglobalfloatvar, value ) + value >= LIMIT)

stillrunning = false;

[/codebox]

The kernel gives timeout error. I think it is because of some errored memory access.

But if someone sees that this code may be causing the problem, please tell me.

Thanks for advance.

Thanks eyalhir74, jan.heckman for your replies.

Actually the calculation is somewhat expensive that is why i want to cancel the rest of the threads.

I did something around

[codebox]shared bool stillrunning;

if (threadIdx.x == 0)

stillrunning = true;

__syncthreads();

if(!stillrunning)

return;

.

.

.

if (atomicAdd( &myglobalfloatvar, value ) + value >= LIMIT)

stillrunning = false;

[/codebox]

The kernel gives timeout error. I think it is because of some errored memory access.

But if someone sees that this code may be causing the problem, please tell me.

Thanks for advance.

You should post all the kernel… with the code you’ve posted there’s no way of knowing what

happens after the stillrunning = false line ???

are you returning to the above code - if so you again reset it to true…

eyal

You should post all the kernel… with the code you’ve posted there’s no way of knowing what

happens after the stillrunning = false line ???

are you returning to the above code - if so you again reset it to true…

eyal

There’s no better way to do this than the shared variable.

There’s no better way to do this than the shared variable.