issue while using break statement in cuda kernel

anshu · November 24, 2010, 12:34pm

Hello,
i am having a problem. My kernel has a loop which runs for say 20 times. But for most of the cases I need to break much before 20 times and hence dont need to execute other kernels. But when i use break statement all the values loaded in shared memory becomes wrong and hence i cannot use break statement.
MY question is what will happen if there is a break statement in a loop in cuda kernel. I can understand that it will cause divergence but why should this affect the final output in terms of results.
Please Help me explain this.

Regards,
Mohit

anshu · November 24, 2010, 12:34pm

Hello,
i am having a problem. My kernel has a loop which runs for say 20 times. But for most of the cases I need to break much before 20 times and hence dont need to execute other kernels. But when i use break statement all the values loaded in shared memory becomes wrong and hence i cannot use break statement.
MY question is what will happen if there is a break statement in a loop in cuda kernel. I can understand that it will cause divergence but why should this affect the final output in terms of results.
Please Help me explain this.

Regards,
Mohit

tera · November 24, 2010, 2:04pm

Do you use [font=“Courier New”]__syncthreads()[/font] inside the loop?

tera · November 24, 2010, 2:04pm

Do you use [font=“Courier New”]__syncthreads()[/font] inside the loop?

anshu · November 25, 2010, 1:19am

Yes I did, just before the end of the loop.

anshu · November 25, 2010, 1:19am

Yes I did, just before the end of the loop.

tera · November 25, 2010, 5:41am

This won’t work, since [font=“Courier New”]__syncthreads()[/font] must be encountered by all threads in the same manner.[sup][1][/sup]
Instead of using [font=“Courier New”]break[/font] inside the loop, you can set a flag variable and then make the whole loop body excluding the [font=“Courier New”]__syncthreads()[/font] conditional depending of that variable.

[sup][1] It appears that on all current hardware actually only all warps need to encounter them, but lets keep things simple and safe for future hardware as well[/sup].

tera · November 25, 2010, 5:41am

This won’t work, since [font=“Courier New”]__syncthreads()[/font] must be encountered by all threads in the same manner.[sup][1][/sup]
Instead of using [font=“Courier New”]break[/font] inside the loop, you can set a flag variable and then make the whole loop body excluding the [font=“Courier New”]__syncthreads()[/font] conditional depending of that variable.

[sup][1] It appears that on all current hardware actually only all warps need to encounter them, but lets keep things simple and safe for future hardware as well[/sup].

anshu · November 26, 2010, 8:39am

Dear Tera,

Thanks a lot for pointing out this :). This is what was causing all problem.

anshu · November 26, 2010, 8:39am

Dear Tera,

Thanks a lot for pointing out this :). This is what was causing all problem.

tera · November 26, 2010, 12:17pm

For threads that have finished their work, just execute the [font=“Courier New”]__syncthreads()[/font] and any data transfers to/from shared memory that may be needed by other threads, but not the rest of the loop body:

int flag = 1;

    for (int i=0; i<20; i++) {

// load data...

__syncthreads();

        if (flag) {

// do work...

if (finished) {

                // this is where the break statement would have been

                flag = 0;

            }

        }

         __syncthreads();

   }

tera · November 26, 2010, 12:17pm

For threads that have finished their work, just execute the [font=“Courier New”]__syncthreads()[/font] and any data transfers to/from shared memory that may be needed by other threads, but not the rest of the loop body:

int flag = 1;

    for (int i=0; i<20; i++) {

// load data...

__syncthreads();

        if (flag) {

// do work...

if (finished) {

                // this is where the break statement would have been

                flag = 0;

            }

        }

         __syncthreads();

   }

Topic		Replies	Views
while loop seems to not working in cuda "whille" infinite loop breaks . CUDA Programming and Performance	7	13045	October 22, 2010
Question about divergence and loops CUDA Programming and Performance	7	7067	November 21, 2010
cant call any kernel function CUDA Programming and Performance	8	4824	June 6, 2011
A stupid question on __syncthread() function CUDA Programming and Performance	5	4656	May 17, 2022
Cuda: threads over 2 warps not synchronising correctly Legacy PGI Compilers	5	6888	May 26, 2011
Why __syncwarp is necessary in undivergent warp reduction? CUDA Programming and Performance	6	3006	April 1, 2022
Incorrect synchronization inside a "while" loop (occuring only in Release mode) CUDA Programming and Performance	10	1587	March 28, 2015
__syncthreads and __threadfence together in a loop CUDA Programming and Performance	5	3582	October 15, 2010
While loops in kernels, question about thread divergence CUDA Programming and Performance cuda , kernel	3	1020	August 21, 2020
About divergent warps CUDA Programming and Performance	3	1588	September 22, 2009

issue while using break statement in cuda kernel

Related topics