issue while using break statement in cuda kernel

Hello,
i am having a problem. My kernel has a loop which runs for say 20 times. But for most of the cases I need to break much before 20 times and hence dont need to execute other kernels. But when i use break statement all the values loaded in shared memory becomes wrong and hence i cannot use break statement.
MY question is what will happen if there is a break statement in a loop in cuda kernel. I can understand that it will cause divergence but why should this affect the final output in terms of results.
Please Help me explain this.

Regards,
Mohit

Hello,
i am having a problem. My kernel has a loop which runs for say 20 times. But for most of the cases I need to break much before 20 times and hence dont need to execute other kernels. But when i use break statement all the values loaded in shared memory becomes wrong and hence i cannot use break statement.
MY question is what will happen if there is a break statement in a loop in cuda kernel. I can understand that it will cause divergence but why should this affect the final output in terms of results.
Please Help me explain this.

Regards,
Mohit

Do you use [font=“Courier New”]__syncthreads()[/font] inside the loop?

Do you use [font=“Courier New”]__syncthreads()[/font] inside the loop?

Yes I did, just before the end of the loop.

Yes I did, just before the end of the loop.

This won’t work, since [font=“Courier New”]__syncthreads()[/font] must be encountered by all threads in the same manner.[sup][1][/sup]
Instead of using [font=“Courier New”]break[/font] inside the loop, you can set a flag variable and then make the whole loop body excluding the [font=“Courier New”]__syncthreads()[/font] conditional depending of that variable.

[sup][1] It appears that on all current hardware actually only all warps need to encounter them, but lets keep things simple and safe for future hardware as well[/sup].

This won’t work, since [font=“Courier New”]__syncthreads()[/font] must be encountered by all threads in the same manner.[sup][1][/sup]
Instead of using [font=“Courier New”]break[/font] inside the loop, you can set a flag variable and then make the whole loop body excluding the [font=“Courier New”]__syncthreads()[/font] conditional depending of that variable.

[sup][1] It appears that on all current hardware actually only all warps need to encounter them, but lets keep things simple and safe for future hardware as well[/sup].

Dear Tera,

Thanks a lot for pointing out this :). This is what was causing all problem.

Dear Tera,

Thanks a lot for pointing out this :). This is what was causing all problem.

For threads that have finished their work, just execute the [font=“Courier New”]__syncthreads()[/font] and any data transfers to/from shared memory that may be needed by other threads, but not the rest of the loop body:

int flag = 1;

    for (int i=0; i<20; i++) {

// load data...

__syncthreads();

        if (flag) {

// do work...

if (finished) {

                // this is where the break statement would have been

                flag = 0;

            }

        }

         __syncthreads();

   }

For threads that have finished their work, just execute the [font=“Courier New”]__syncthreads()[/font] and any data transfers to/from shared memory that may be needed by other threads, but not the rest of the loop body:

int flag = 1;

    for (int i=0; i<20; i++) {

// load data...

__syncthreads();

        if (flag) {

// do work...

if (finished) {

                // this is where the break statement would have been

                flag = 0;

            }

        }

         __syncthreads();

   }