GLSL loop 'break' instruction not executed

sylefeb · January 3, 2013, 11:21am

Hi All,

I am using driver 310.70 on a GeForce GTX 480 / 580 (both show the same issue)
I have a loop as follows:

int success = 0;
while ( success == 0 ) {
if (something that happens for sure) {
success = 1;
break;
}
}

This never terminates (driver crash). I modified it as:

int success = 0;
int iter = 0;
while ( success == 0 && iter++ < 64) {
if (something that happens for sure) {
success = 1;
break;
}
}

This terminates but always with success == 1 (implying that the first version should not loop indefinitely). Performance also depends on the value with which iter is compared (using 128 makes it slower), which makes no sense since I verified that all loops terminate after approx 15 iterations … (checked using an atomicMax on a single-entry buffer). The computation is correctly performed, but all happens as if the break was skipped (or as if the loop was unrolled, see below).

I verified with cgc 3.1.0013 (profile gp5fp) and the break appears in the generated code. The loop is not unrolled. I tried a variety of pragma unroll, etc. as well as replacing the break by a continue and other tricks but without success.

Has anyone seen a similar issue? Any ideas?

Thanks!

sylefeb · January 3, 2013, 3:04pm

Some more info:

I have an atomic operation in the body of the loop (atomicCompSwap). If I remove it for an (incorrectly non-atomic) equivalent, then the loop behaves as expected.

This looks very much like the atomic is forcing a loop unrolling. Any reason for that?

Edit: I should also precise that this is using:
#version 420
#extension GL_NV_shader_buffer_load : enable
#extension GL_NV_shader_buffer_store : enable
#extension GL_NV_gpu_shader5 : enable

sylefeb · January 5, 2013, 5:24pm

I still did not find a good solution, but I was at least able to get the normal loop behavior. The trick is to add a useless condition using an atomic in the while statement.
In my case I used:

while ( atomicAdd(u_Table + foo,0) > 0 )

where table contains only '1’s. I use foo to avoid all threads to access the same memory location.

This is really ugly, but if fixes the loop: The penalty I pay for this is much lower than the penalty of the broken loop … (I tried without an atomic, but no luck)

Note that it does not always work depending of the code inside the loop. For instance if I use ‘return’ instead of ‘break’, it stops working properly. So this is really just a hack.

Well, at least I can expect a huge boost in performance when this issue is fixed :-)

msomeone · January 8, 2013, 5:56pm

Got same issues when used those extensions for global atomic stores. Nevers succeeded to use stores in VS or GS. Seems like FS works only :( It’s always a lil magick to make stores working. For example even Cyril Crassin done some magick in his shader to let A-Buffer sample work with load\store extensions.

sylefeb · January 9, 2013, 12:22am

How about the imageAtomic* ? Are those better supported ?

(I cannot trivially test because I need an atomicCAS 64 bits … but this may be doable nonetheless)

ChristophKubisch · January 9, 2013, 11:51am

if its related to atomics (coding custom semaphores) the “break” stuff doesnt work so well in SIMT, need to use a different code structure, see below

int  spin = 10000; // spin counter as safety measure, not required
        bool done = false;
        while (!done && spin-- > 0)
        {
          // try to enter spinlock
          if ( imageAtomicExchange(oitSpinLock, screenPos, uint(1)) == uint(0) )
          {
            ....
            // release spinlock
            imageAtomicExchange(oitSpinLock, screenPos, uint(0));
            // leave spinlock
            done = true;
            // NEVER use break here
          }
        }

sylefeb · February 4, 2013, 3:17pm

Sorry it took a long time to answer => Christoph I think you are correct on the issue happening here. I also discussed this problem with Cyril Crassin and he pointed out a modification of the code that solved the problem on Fermi hardware (essentially removing the ‘break’). This gave a huge boost in performance ( basically x2 :) ) so many thanks Cyril!!
Unfortunately the problem remains on Kepler. Let’s see how this evolves.

Thanks!

Topic		Replies	Views
Incorrect OpenGL(GLSL) compiler behavior on nested loops Drivers - Linux, Windows, MacOS	0	301	April 24, 2024
GL_ARB_gpu_shader_int64 compiler breaks code logic. Linux	0	809	November 23, 2017
NVCC won't unroll for loop CUDA Programming and Performance	11	6323	February 18, 2011
issue while using break statement in cuda kernel CUDA Programming and Performance	11	6313	November 26, 2010
unspecified launch failure only when break from a for loop CUDA Programming and Performance	8	1652	July 20, 2009
Why using a break during a loop can save many register usage? CUDA Programming and Performance	10	4447	March 2, 2011
Possible NVCC compiler bug Two 'breaks' in different loops :( CUDA Programming and Performance	3	5343	January 16, 2008
Bug using "continue" in shaders (minimum example code included) OpenGL	4	1982	December 17, 2014
Bug report: GLSL perf: dead code unrecognized + unrolled code 25 times more costly than loop OpenGL	2	533	January 31, 2019
while(true) loop optimized out huh CUDA Programming and Performance	2	5553	December 19, 2007

GLSL loop 'break' instruction not executed

Related topics