GLSL loop 'break' instruction not executed

Hi All,

I am using driver 310.70 on a GeForce GTX 480 / 580 (both show the same issue)
I have a loop as follows:

int success = 0;
while ( success == 0 ) {
if (something that happens for sure) {
success = 1;

This never terminates (driver crash). I modified it as:

int success = 0;
int iter = 0;
while ( success == 0 && iter++ < 64) {
if (something that happens for sure) {
success = 1;

This terminates but always with success == 1 (implying that the first version should not loop indefinitely). Performance also depends on the value with which iter is compared (using 128 makes it slower), which makes no sense since I verified that all loops terminate after approx 15 iterations … (checked using an atomicMax on a single-entry buffer). The computation is correctly performed, but all happens as if the break was skipped (or as if the loop was unrolled, see below).

I verified with cgc 3.1.0013 (profile gp5fp) and the break appears in the generated code. The loop is not unrolled. I tried a variety of pragma unroll, etc. as well as replacing the break by a continue and other tricks but without success.

Has anyone seen a similar issue? Any ideas?


Some more info:

I have an atomic operation in the body of the loop (atomicCompSwap). If I remove it for an (incorrectly non-atomic) equivalent, then the loop behaves as expected.

This looks very much like the atomic is forcing a loop unrolling. Any reason for that?

Edit: I should also precise that this is using:
#version 420
#extension GL_NV_shader_buffer_load : enable
#extension GL_NV_shader_buffer_store : enable
#extension GL_NV_gpu_shader5 : enable

I still did not find a good solution, but I was at least able to get the normal loop behavior. The trick is to add a useless condition using an atomic in the while statement.
In my case I used:

while ( atomicAdd(u_Table + foo,0) > 0 )

where table contains only '1’s. I use foo to avoid all threads to access the same memory location.

This is really ugly, but if fixes the loop: The penalty I pay for this is much lower than the penalty of the broken loop … (I tried without an atomic, but no luck)

Note that it does not always work depending of the code inside the loop. For instance if I use ‘return’ instead of ‘break’, it stops working properly. So this is really just a hack.

Well, at least I can expect a huge boost in performance when this issue is fixed :-)

Got same issues when used those extensions for global atomic stores. Nevers succeeded to use stores in VS or GS. Seems like FS works only :( It’s always a lil magick to make stores working. For example even Cyril Crassin done some magick in his shader to let A-Buffer sample work with load\store extensions.

How about the imageAtomic* ? Are those better supported ?

(I cannot trivially test because I need an atomicCAS 64 bits … but this may be doable nonetheless)

if its related to atomics (coding custom semaphores) the “break” stuff doesnt work so well in SIMT, need to use a different code structure, see below

int  spin = 10000; // spin counter as safety measure, not required
        bool done = false;
        while (!done && spin-- > 0)
          // try to enter spinlock
          if ( imageAtomicExchange(oitSpinLock, screenPos, uint(1)) == uint(0) )
            // release spinlock
            imageAtomicExchange(oitSpinLock, screenPos, uint(0));
            // leave spinlock
            done = true;
            // NEVER use break here

Sorry it took a long time to answer => Christoph I think you are correct on the issue happening here. I also discussed this problem with Cyril Crassin and he pointed out a modification of the code that solved the problem on Fermi hardware (essentially removing the ‘break’). This gave a huge boost in performance ( basically x2 :) ) so many thanks Cyril!!
Unfortunately the problem remains on Kepler. Let’s see how this evolves.