Can a warp exit before a global memory load is finished?

In my kernel there are some values that are in global memory that I may need somewhere in the middle. So what I want to do is to load this data at the beginning of the kernel, then do some computations with values that I already have, then use the data from global memory since by that time it will be loaded.
However, the warp may exit almost right after being created, before using these values.
So my question is: does the warp have to wait until the load is done or it can exit with no additional cost?

A somewhat related question: how many cycles does it take to schedule a warp?

If you don’t somehow reference the data you’re loading in your code in some way, the compiler will optimize that load away. If you do reference the result of the load then that will typically be the point of synchronization (though perhaps earlier depending on what else is going on as synchronization resources are limited). Conditional code wont change this.

So the compiler will generally ensure all loads are complete prior to exit. And I wouldn’t be surprised if the EXIT instruction itself forced a sync on any outstanding memory ops.

Once loaded, a warp has a zero clock cost for context switch with another active warp. I’m not sure what the cost is in loading the warp in the first place, but that can likely be hidden by other active warps.

Basically my function is very similar to cunn_LookupTable_accGradParametersKernel here: , and I feel that it could be faster, but I’m not really sure how.

Shakespeare once said that conditionality and probability share the same bed

if this is true, then you might gain by favouring the most probable path
if the condition is sufficiently late, you may hide the memory loads by placing them early on as you intend, provided that the attempt is not sabotaged by intermediate memory barriers, as scottgray also points out
if the above does not hold, you may intend to favour the most probable path - if, on average, most warps exit early, you might wish to postpone the reads, and vice versa