Break from loop inside conditional... Weird behaviour...

Hey all,

I’m encountering some strange instructions being emitted by nvcc with the following code…

[codebox]for(unsigned int index = 0; index < count; index += blockDim.x)


value = some_values[index];

if(value < threshold)



// index is always +blockDim.x more than expected for the case that the loop was broken.[/codebox]

Note: The same behaviour can be seen with similar looping code, eg: “while(true) { if(some_non_const_true_expression) break; }” will stall forever (windows will break with watchdog timer).

The expected behaviour for the case s.t. value < threshold, would be to break out of the loop completely… however the loop appears to continue no matter what, as if nvcc is either interpreting break as continue, or break is only breaking from the current scope (in this case the scope of the if statement).

Has this problem already been identified before? (for now I’ve had to put the iteration statement after my conditional break)

CUDA v2.1, Windows XP/Vista 32bit, 8800 GT - Forceware 181.20

I’ll look into de-cudafying it now, to get more details.

Some time ago I encountered a similar problem - but only if there was too much code in the loop.
Is there a limitation for relative jumps ?
At the moment I’m using many breaks and continues without problem.

Having looked at the instructions (from decuda), it’s not doing anything wrong… I just made a false assumption (I assumed the formal parameters were being passed in correctly to my kernel, which was wrong).

My problem was actually the fact that unsigned chars’s in the formal parameter list aren’t 1-byte aligned (didn’t bother experimenting to figure it out, but I’m guessing they’re 2 or 4 byte aligned)… and of course the programming guide ‘still’ doesn’t identify alignment requirements of all primitive types (in fact, the documentation has gotten worse from CUDA 1.x to 2.x, telling you to use __alignof for ‘everything’ (wrong), and then their code sample doesn’t even use __alignof… go figure.)

So yeah, issue to do with the formal parameter list of the function I was testing everything in (unsigned char parameter and all params after it were corrupt (incorrect values), thus making my assumption that if(x == y) false, thus never breaking).

I’d be very interested in what alignment assumptions you were making that were not true. When you say they weren’t 1-byte aligned, what does that mean exactly?

The programming guide does say that any allocation from cudaMalloc will be at least 256-byte aligned.

Hmm, I probably shouldn’t have used the term byte-aligned in relation to my unsigned char specifically - because unsigned chars probably don’t have alignment requirements - but rather the pointer after my unsigned char did… (sorry for messing tht detail up.)

Pointers in formal parameter lists (if I’m not mistaken) have a byte alignment of 4 bytes (meaning you have to pad any data before that formal parameter by incrementing the offset appropriately) - and I was not doing that (I’ve made this mistake before… more than once), i fixed my problem by changing my formal parameter from an unsigned char to an unsigned int (1 byte to 4 bytes), but I could’ve just as simply fixed it by incrementing the parameter offset another 3 bytes after setting my unsigned char parameter.

Other primitives have similar alignment requirements (uint2, uint4, float4, etc - all have alignment requirements in formal parameter lists), not that this is clearly documented (I had Tim Murray clear this up for me a few months ago in another thread).

Note: This only applies to the Driver API - the Runtime API automatically handles alignment for formal parameters behind the scenes…

I’ve probably made other factual mistakes in this ramble… feel free to point things out if they don’t make sense.

Ah I see. This makes more sense, although it is beyond my experience because so far I have only used the runtime API.