The end of a for(;;)-loop generates this PTX code in the SDK1.1 beta with -O3
@$p2 bra $Lt_0_10; // bra.uni $Lt_0_8; // $Lt_0_14: $Lt_0_8:
This surely must be a bug? Something between Lt_0_14 and Lt_0_8 was optimized away, but the compiler didn’t realize the branch became a nop?
nvcc: Built on Wed_Nov__7_03:26:42_PST_2007
Cuda compilation tools, release 1.1, V0.2.1221