Sharing a GPU server for CUDA programming in a multi-user operating system

Pedram · November 14, 2010, 2:18am

We are going to set up a GPU server with 2 Xeon quad core CPUs and 4 Tesla C2050 GPUs for the students of a CUDA programming graduate course. Several students should be able to use the server simultaneously and do their CUDA programming assignments on it. We have 2 options for OS, Linux or Windows Server 2008 Enterprise Edition.
Here I have two questions:
1- Which one of the operating systems I mentioned is better for this purpose and why?
2- Is it possible for all students to login to the server simultaneously (using SSH in Linux or remote desktop in windows) and run their own programming assignments at the same time? I wonder what happens in this scenario and how the 4 GPUs will be assigned to the users. As far as I know each user will be able to use all 4 GPUs and in Tesla series up to 16 kernels can be executed simultaneously. Am I right?

I have only worked with CUDA on a single user windows machine with a single GPU and so I have no idea what happens exactly in a multi-user machine in terms of GPU resource sharing.

Any advice you could give me would be greatly appreciated.
Thank you all for your help.

Pedram · November 14, 2010, 2:18am

We are going to set up a GPU server with 2 Xeon quad core CPUs and 4 Tesla C2050 GPUs for the students of a CUDA programming graduate course. Several students should be able to use the server simultaneously and do their CUDA programming assignments on it. We have 2 options for OS, Linux or Windows Server 2008 Enterprise Edition.
Here I have two questions:
1- Which one of the operating systems I mentioned is better for this purpose and why?
2- Is it possible for all students to login to the server simultaneously (using SSH in Linux or remote desktop in windows) and run their own programming assignments at the same time? I wonder what happens in this scenario and how the 4 GPUs will be assigned to the users. As far as I know each user will be able to use all 4 GPUs and in Tesla series up to 16 kernels can be executed simultaneously. Am I right?

I have only worked with CUDA on a single user windows machine with a single GPU and so I have no idea what happens exactly in a multi-user machine in terms of GPU resource sharing.

Any advice you could give me would be greatly appreciated.
Thank you all for your help.

seibert · November 14, 2010, 8:47pm

In the default boot-up configuration, if your students do not call cudaSetDevice() explicitly, they will all run their kernels on the same device. All CUDA devices support (though in the distant past there were bugs) time slicing between different processes when the execution of a kernel is finished. Note that there is no preemption, and that the concurrent kernel execution supported on Fermi is only for kernels issued from the same CUDA context in different streams. Different user processes cannot run kernels exactly simultaneously.

With the Linux tool nvidia-smi, you can set your GPUs as “compute-exclusive”, in which case each student process will associate with a different device when the CUDA context is initialized. Unfortunately, once all the devices are in use, new jobs will abort when they try to initialize a CUDA context. The only way to manage this would be to force students to submit jobs to a batch queue which would only run as many concurrent jobs as there were CUDA devices.

There is not, as far as I know, an existing device assignment strategy to load balance many users uniformly over CUDA devices, while allowing multiple users on the same device if no others are available. This would have to be some kind of custom device scheduler library that students would call to obtain the “least loaded device”.

seibert · November 14, 2010, 8:47pm

In the default boot-up configuration, if your students do not call cudaSetDevice() explicitly, they will all run their kernels on the same device. All CUDA devices support (though in the distant past there were bugs) time slicing between different processes when the execution of a kernel is finished. Note that there is no preemption, and that the concurrent kernel execution supported on Fermi is only for kernels issued from the same CUDA context in different streams. Different user processes cannot run kernels exactly simultaneously.

With the Linux tool nvidia-smi, you can set your GPUs as “compute-exclusive”, in which case each student process will associate with a different device when the CUDA context is initialized. Unfortunately, once all the devices are in use, new jobs will abort when they try to initialize a CUDA context. The only way to manage this would be to force students to submit jobs to a batch queue which would only run as many concurrent jobs as there were CUDA devices.

There is not, as far as I know, an existing device assignment strategy to load balance many users uniformly over CUDA devices, while allowing multiple users on the same device if no others are available. This would have to be some kind of custom device scheduler library that students would call to obtain the “least loaded device”.

gabefair · January 3, 2019, 7:59pm

Hello, has there been any new developments or best practices at Nvidia since 2010? I would like to share my CUDA research server with my fellow PHD students.

Topic		Replies	Views
Kernels launch - parallel or serial? CUDA Programming and Performance	16	6835	January 11, 2010
Multi-user-systems und multi-gpu-usage CUDA Programming and Performance	9	6222	July 15, 2008
CPU Cores Per GPUs CUDA Programming and Performance	11	2455	April 14, 2013
Utilization of SMs in a GPU CUDA Programming and Performance	3	9355	July 4, 2010
Problematic multi GPU execution CUDA Programming and Performance	6	1986	June 12, 2012
GPU sharing among different application with different CUDA context CUDA Programming and Performance	23	18300	December 17, 2020
GTX-295 CUDA Programming and Performance	7	3603	June 12, 2010
Multiple kernels in flight? CUDA Programming and Performance	19	26844	August 28, 2007
GPU-CPU & GPU-GPU synchronization query on advanced CUDA features CUDA Programming and Performance	12	17420	June 14, 2008
Concurrent Kernel Execution and Context switching Problem CUDA Programming and Performance	11	8282	July 8, 2015

Sharing a GPU server for CUDA programming in a multi-user operating system

Related topics