Optimization bug? CUDA 1.1 beta

The end of a for(;;)-loop generates this PTX code in the SDK1.1 beta with -O3

       @$p2 bra        $Lt_0_10;               //  

        bra.uni         $Lt_0_8;                //  



This surely must be a bug? Something between Lt_0_14 and Lt_0_8 was optimized away, but the compiler didn’t realize the branch became a nop?

nvcc: Built on Wed_Nov__7_03:26:42_PST_2007

Cuda compilation tools, release 1.1, V0.2.1221

– Kuisma