Question about cudaSetDevice and multiple host threads

Suppose you have 2 GPUs and two host threads (0 and 1), and these two threads interleave in the following way:

thread 0: cudaSetDevice(0)
thread 1: cudaSetDevice(1)
thread 0: cudaMalloc(…)

Is cudaMalloc going to be performed on GPU 0 or 1?

Nevermind. It’s GPU 0. The documentation explains it:

