limit for goto in Cuda 1.1 ? 'ptxas' died with ...

Is there a limit for jumps in CUDA 1.1 ?
I have a pretty large loop and a goto [startlabel] at the end - the result is a

nvcc error : ‘ptxas’ died with status 0xC0000005 (ACCESS_VIOLATION)

Any suggestions?

A small thing I found is:

if ( a<b ) goto label; // a conditional jump produces the error

can be fixed with:

if ( a>b ) return;
goto label; // no error but unbelievable slow

Suddenly my code gets incredibly slow - like a far call on the CPU where the instruction cache get emptied (just guess).