What happens for load instructions ?

The guide doesn’t seem to be very clear what happens during a load instruction.

There are different solutions thinkable:

  1. The thread stalls, the entire warp stalls, the schedular tries to find another warp to execute, the other warp stalls as well, until all warps are stalled and out of warp resources.

  2. The thread tries to continue with executing other instructions which do not depend on the load, until it hits instructions which depend on the load, it stalls, and everything else stalls like in 1.

  3. The thread stalls and is switched with another thread from the block but warp continues. (Doesn’t seem to be the case).

I am starting to suspect it’s case 1 this would mean it’s impossible to hide the latency inside a single thread by trying to execute other instructions in the same thread while the load happens ?!?

So the claim of “latency hiding” seems exagerated/inflated.

It seems only other warps could be run but those also stall real fast, and then everything is stalled ?!

The guide should be more clear on this.

Depending on the decision of the scheduler it’s either 1 or 2. 3 is’t possible since the minimal scheduling unit is a warp, not a thread.

Are you unsure how the schedular works ?

Or do you mean the schedular can make different decisions ? If the latter then is there a way to influence the decision making ?

The scheduler decides which of the runnable warps it picks. In this decision it has to take into account several undocumented factors like banking of registers.

I’m not aware of any options to influence the scheduler’s decisions.