Can NVLink combine 2x GPUs into 1x Big GPU?

nluehr · June 5, 2019, 3:24pm

As correctly noted above, NVLink provides a fast interconnect between GPUs, but does not aggregate those GPUs into a single logical device. That said, DL training can usually be efficiently spread across multiple GPUs by increasing the minibatch size and distributing different sets of images to each GPU. Horovod is a third-part tool provided in our containers that simplifies the task of parallelizing over multiple GPUs (or even multiple hosts). Alternatively you can use TF’s Distribution Strategies approach.