In my kernel there are some values that are in global memory that I may need somewhere in the middle. So what I want to do is to load this data at the beginning of the kernel, then do some computations with values that I already have, then use the data from global memory since by that time it will be loaded.
However, the warp may exit almost right after being created, before using these values.
So my question is: does the warp have to wait until the load is done or it can exit with no additional cost?
A somewhat related question: how many cycles does it take to schedule a warp?