Do false-predicated loads actually do nothing?

Here’s another advanced usage question that nobody likely has an answer to:

The documentation says that statements that are predicated with predicates that evaluate to false do nothing in that thread.

Has anybody observed that this is not actually the case with loads (and maybe stores)? Say I do this:

 setp.eq.u32 p, out_addr, 0;

  @p ld.global.b32 val, [0];

Where out_addr is a value loaded from const memory and p always evaluates to false. The load from ‘null’ is guaranteed to crash if the statement goes ahead. What I find is that my kernels with such statements do indeed crash even though the load statement shouldn’t be executed.

Now perhaps predicated execution actually executes the statement regardless of the predicate and throws away the value later if the predicate is false. However, this doesn’t really fit with the docs.

The major problem here is with read coalescing. We’re told that we can get coalescing even if some threads don’t participate. But how can a thread ‘not participate’ if the above statements don’t work?

Anybody else have some insight?