Nowadays, can I program in CUDA for only one GPU (setdevice(0)) and the driver divide the workload for 2 GPUs automatically (see the 2 GPUs (Cluster) as one)?
Can CUDA take advantage of a SLI connection in this way?
cublasXt and cufftXt can solve a single problem (e.g. a matrix multiply, or an FFT) on 2 or more GPUs “automatically”. I’m not aware of any such capability in cuDNN
But I do not understand. In the NVIDIA DGX-2 Professional Computing Solution site https://videocardz.net/nvidia-dgx-2/ states “81920 Unified Cores”. It seems all cores unified.