manage jobs in multi-gpu system with compute exclusive mode or not

We have a multi-gpu system and we find that use nvidia-smi to set compute exclusive mode for each GPU works well. Each new job will try to use the next available GPU. However, now we are moving to Fermi based card. And we know that Fermi based card are capable of supporting multiple jobs at the same time. Now, is it possible to manage the GPUs dynamically but at the same time allow one GPU to support more jobs? We are thinking about fixing the maximum concurrent job a GPU can support to 2. Is it possible for nvidia to add this kind of support? Or is there any command we can run to query the number of jobs running on each GPU?

Fermi can run multiple kernels at the same time, but only from streams coming from the same context, which implies from the same host thread. Nothing changes from a cluster management point of view with Fermi compared to previous generation GPUs

Fermi can run multiple kernels at the same time, but only from streams coming from the same context, which implies from the same host thread. Nothing changes from a cluster management point of view with Fermi compared to previous generation GPUs

Yeah, I know what Fermi documentation says. But it is able to switch context fast. If you have a fermi card, you can do a simple test by running several instance of nbody example. In my GTX 470, if I ran one instance, I get around 450 GFLOPs, if I ran two, I get around 250 GFLOPs for both. I can ran 3, 4 instance at the same time and the total GFLOPS stays around 500 GFLOPs.

I found this feature (maybe unintended) ideal for us. A new job don’t have to wait the current job to finish, it just split the GPU resources. However, we don’t want too many jobs share the same GPU since it not only make everyone’s performance down, but also use up the video memories and could result in crash. So we would like to restrict the maximum number of jobs that is scheduled. Our job scheduler can do that, but I don’t know what will happen to the GPU allocation if we turn the compute exclusive mode off.

Suppose I have 4 GPU and my job scheduler will allow maximum 8 jobs, and I turn off the compute exclusive mode. Job 1 will grab one GPU for sure. How about job 2, will it still grab a “free GPU” or it is possible for it to grab the same GPU that job 1 is using? And suppose it will grab a “free” GPU and job 3 and 4 are doing the same thing and grab the remaining GPUs. What happen when the job 5, 6, 7, 8 arrives? I suppose job 5 will pick any GPU but what about job 6? Can it tell that one gpu is used by 2 jobs and other 3 is used by 1 job only so it will choose one of the other 3 GPU?

Yeah, I know what Fermi documentation says. But it is able to switch context fast. If you have a fermi card, you can do a simple test by running several instance of nbody example. In my GTX 470, if I ran one instance, I get around 450 GFLOPs, if I ran two, I get around 250 GFLOPs for both. I can ran 3, 4 instance at the same time and the total GFLOPS stays around 500 GFLOPs.

I found this feature (maybe unintended) ideal for us. A new job don’t have to wait the current job to finish, it just split the GPU resources. However, we don’t want too many jobs share the same GPU since it not only make everyone’s performance down, but also use up the video memories and could result in crash. So we would like to restrict the maximum number of jobs that is scheduled. Our job scheduler can do that, but I don’t know what will happen to the GPU allocation if we turn the compute exclusive mode off.

Suppose I have 4 GPU and my job scheduler will allow maximum 8 jobs, and I turn off the compute exclusive mode. Job 1 will grab one GPU for sure. How about job 2, will it still grab a “free GPU” or it is possible for it to grab the same GPU that job 1 is using? And suppose it will grab a “free” GPU and job 3 and 4 are doing the same thing and grab the remaining GPUs. What happen when the job 5, 6, 7, 8 arrives? I suppose job 5 will pick any GPU but what about job 6? Can it tell that one gpu is used by 2 jobs and other 3 is used by 1 job only so it will choose one of the other 3 GPU?

The cards have no direct concept of the job scheduler allocating tasks (apart from exclusive compute mode). Your job scheduler is the one that needs to be configured to distribute the load appropriately. In the job scheduler I’ve used for a cluster, it preferentially sends the jobs to free nodes. The scheduler may require tweaking to keep track of the physical load on the cards (or just pick the card with the lowest number of jobs running).

The cards have no direct concept of the job scheduler allocating tasks (apart from exclusive compute mode). Your job scheduler is the one that needs to be configured to distribute the load appropriately. In the job scheduler I’ve used for a cluster, it preferentially sends the jobs to free nodes. The scheduler may require tweaking to keep track of the physical load on the cards (or just pick the card with the lowest number of jobs running).

yeah, I know. I was just trying to see if I can achieve what I want without changing the job scheduler.

yeah, I know. I was just trying to see if I can achieve what I want without changing the job scheduler.

As an everyday user of a cluster of GPUs, I would seriously spam the admins every single day that they allowed more than one job to run on a single GPU (if scheduled by individual GPUs). Total GFLOPS doesn’t matter, you are still asking your users to wait roughly Nx longer for their jobs to complete if you allow N jobs on a single GPU. Do you allow more than one job to share a CPU core in your cluster? (hopefully, that is a rhetorical question…)

As an everyday user of a cluster of GPUs, I would seriously spam the admins every single day that they allowed more than one job to run on a single GPU (if scheduled by individual GPUs). Total GFLOPS doesn’t matter, you are still asking your users to wait roughly Nx longer for their jobs to complete if you allow N jobs on a single GPU. Do you allow more than one job to share a CPU core in your cluster? (hopefully, that is a rhetorical question…)

The ability to allow extra job to run on GPU does have it’s applications. I understand from the owner of the job’s perspective that he want his job to finish asap. But suppose your GPU cluster is consist of cluster nodes that contains 4 GPUs, do you spam the admin every single day if they allow more than one job running on that node? Or do you spam the admin if they allow more than one job running on the entire cluster?

No, we don’t allow more than one job to share a CPU core. But I tested with quad-core cpu with hyperthread (so appear to have 8 cores) to support 8 GPUs. We don’t see any problem. Maybe I didn’t look closely.

The ability to allow extra job to run on GPU does have it’s applications. I understand from the owner of the job’s perspective that he want his job to finish asap. But suppose your GPU cluster is consist of cluster nodes that contains 4 GPUs, do you spam the admin every single day if they allow more than one job running on that node? Or do you spam the admin if they allow more than one job running on the entire cluster?

No, we don’t allow more than one job to share a CPU core. But I tested with quad-core cpu with hyperthread (so appear to have 8 cores) to support 8 GPUs. We don’t see any problem. Maybe I didn’t look closely.

No. That’s the point of the scheduler. It allows the user to reserve the resources that they need. If I need a whole node worth of GPUs, I just request 1 node, 4ppn, and 4 graphics.

No. That’s the point of the scheduler. It allows the user to reserve the resources that they need. If I need a whole node worth of GPUs, I just request 1 node, 4ppn, and 4 graphics.