Sharing a GPU server for CUDA programming in a multi-user operating system

We are going to set up a GPU server with 2 Xeon quad core CPUs and 4 Tesla C2050 GPUs for the students of a CUDA programming graduate course. Several students should be able to use the server simultaneously and do their CUDA programming assignments on it. We have 2 options for OS, Linux or Windows Server 2008 Enterprise Edition.
Here I have two questions:
1- Which one of the operating systems I mentioned is better for this purpose and why?
2- Is it possible for all students to login to the server simultaneously (using SSH in Linux or remote desktop in windows) and run their own programming assignments at the same time? I wonder what happens in this scenario and how the 4 GPUs will be assigned to the users. As far as I know each user will be able to use all 4 GPUs and in Tesla series up to 16 kernels can be executed simultaneously. Am I right?

I have only worked with CUDA on a single user windows machine with a single GPU and so I have no idea what happens exactly in a multi-user machine in terms of GPU resource sharing.

Any advice you could give me would be greatly appreciated.
Thank you all for your help.

We are going to set up a GPU server with 2 Xeon quad core CPUs and 4 Tesla C2050 GPUs for the students of a CUDA programming graduate course. Several students should be able to use the server simultaneously and do their CUDA programming assignments on it. We have 2 options for OS, Linux or Windows Server 2008 Enterprise Edition.
Here I have two questions:
1- Which one of the operating systems I mentioned is better for this purpose and why?
2- Is it possible for all students to login to the server simultaneously (using SSH in Linux or remote desktop in windows) and run their own programming assignments at the same time? I wonder what happens in this scenario and how the 4 GPUs will be assigned to the users. As far as I know each user will be able to use all 4 GPUs and in Tesla series up to 16 kernels can be executed simultaneously. Am I right?

I have only worked with CUDA on a single user windows machine with a single GPU and so I have no idea what happens exactly in a multi-user machine in terms of GPU resource sharing.

Any advice you could give me would be greatly appreciated.
Thank you all for your help.

In the default boot-up configuration, if your students do not call cudaSetDevice() explicitly, they will all run their kernels on the same device. All CUDA devices support (though in the distant past there were bugs) time slicing between different processes when the execution of a kernel is finished. Note that there is no preemption, and that the concurrent kernel execution supported on Fermi is only for kernels issued from the same CUDA context in different streams. Different user processes cannot run kernels exactly simultaneously.

With the Linux tool nvidia-smi, you can set your GPUs as “compute-exclusive”, in which case each student process will associate with a different device when the CUDA context is initialized. Unfortunately, once all the devices are in use, new jobs will abort when they try to initialize a CUDA context. The only way to manage this would be to force students to submit jobs to a batch queue which would only run as many concurrent jobs as there were CUDA devices.

There is not, as far as I know, an existing device assignment strategy to load balance many users uniformly over CUDA devices, while allowing multiple users on the same device if no others are available. This would have to be some kind of custom device scheduler library that students would call to obtain the “least loaded device”.

In the default boot-up configuration, if your students do not call cudaSetDevice() explicitly, they will all run their kernels on the same device. All CUDA devices support (though in the distant past there were bugs) time slicing between different processes when the execution of a kernel is finished. Note that there is no preemption, and that the concurrent kernel execution supported on Fermi is only for kernels issued from the same CUDA context in different streams. Different user processes cannot run kernels exactly simultaneously.

With the Linux tool nvidia-smi, you can set your GPUs as “compute-exclusive”, in which case each student process will associate with a different device when the CUDA context is initialized. Unfortunately, once all the devices are in use, new jobs will abort when they try to initialize a CUDA context. The only way to manage this would be to force students to submit jobs to a batch queue which would only run as many concurrent jobs as there were CUDA devices.

There is not, as far as I know, an existing device assignment strategy to load balance many users uniformly over CUDA devices, while allowing multiple users on the same device if no others are available. This would have to be some kind of custom device scheduler library that students would call to obtain the “least loaded device”.

Hello, has there been any new developments or best practices at Nvidia since 2010? I would like to share my CUDA research server with my fellow PHD students.