I’d like to accelerate my Thrust code by running on multiple GPUs.
Is there a shortcut allowing me treat multiple GPUs as a single CUDA stream? A software abstraction, I guess at the driver level? I imagine there’d be some hardware assist like NVLink to share memory across GPUs.
Specifically I have in mind using the AWS p3dn.24xlarge instances, having 8 x V100.
Thanks in advance if you can educate me about this.
Thanks,
Hugh