Unified programming model across multiple GPUs

hwinkler · January 25, 2018, 8:51pm

As best I understand, with access to multiple V100 GPUs connected by nvlink, I have access to a unified memory model, yet I must still write host code that launches kernels in streams on each GPU.

Can nVidia design a driver offering me a unified stream that knows how to allocate my kernel across all the SMs on all the cards in the system, transparently to me?

Am I asking the impossible?

Or am I asking for something that already exists?

Thank you if someone could please point me to any existing discussion along these lines.

Topic		Replies	Views
Multiple GPUs as single GPU? CUDA Programming and Performance	1	645	October 25, 2019
Can Unified Memory Migration use NVLink? CUDA Programming and Performance	2	739	October 12, 2021
multiple gpu and unified memory CUDA Programming and Performance	3	4552	March 29, 2022
Multi GPUs using MPI in Unified Memory nvc, nvc++ and nvfortran	1	13	March 12, 2025
Processing multiple graphs CUDA Programming and Performance	0	327	January 7, 2022
OpenACC directives to transfer data between GPUs Legacy PGI Compilers	3	820	May 7, 2021
Hardware coherence over NVLink CUDA Programming and Performance	3	3223	May 1, 2023
Is the protocol support for nvlink complete for the 3090 graphics card? GPU - Hardware	0	472	July 29, 2023
Optimal multi-GPU system CUDA Programming and Performance	7	1064	September 6, 2017
Multi-GPU programming What's the best approach to sharing the workload across multiple G CUDA Programming and Performance	4	2307	January 12, 2010

Unified programming model across multiple GPUs

Related topics