Assuming a system has a N-core CPU, and N GPUs, simpleMultiGPU will spawn N threads, 1 for each GPU. From my (little) understanding, there are now N + 1 host threads (main calling thread + N spawned threads) running on a N-core CPU. My question is, which is more optimal:
( A ) N GPUs, N + 1 CPU threads (inclusive of main calling thread)
or
( B ) N GPUs, N CPU threads (inclusive of main calling thread)
( B ) means managing one GPU device (e.g. device 0) on the main calling thread and creating new threads for subsequent GPU devices.
I looked at MonteCarloMultiGPU and it uses ( A ) N + 1 CPU threads (inclusive of main calling thread).