Suppose you have 2 GPUs and two host threads (0 and 1), and these two threads interleave in the following way:
thread 0: cudaSetDevice(0)
thread 1: cudaSetDevice(1)
thread 0: cudaMalloc(…)
Is cudaMalloc going to be performed on GPU 0 or 1?
Suppose you have 2 GPUs and two host threads (0 and 1), and these two threads interleave in the following way:
thread 0: cudaSetDevice(0)
thread 1: cudaSetDevice(1)
thread 0: cudaMalloc(…)
Is cudaMalloc going to be performed on GPU 0 or 1?
Nevermind. It’s GPU 0. The documentation explains it: [url]CUDA Runtime API :: CUDA Toolkit Documentation