How to limit the number of SMs used by a program?

I am running multiple compute-intensive tasks on an NVIDIA A800 GPU,. Since the A800 does not have MIG (Multi-Instance GPU) support, I need to explore methods to limit the maximum number of SM cores available to a single process (e.g., restricting a training task to 30% of SMs) without modifying kernel code.

This product page specifically calls out MIG for A800.

MIG or MPS are possible methods to limit SMs for a particular client/process.

Then there are the manual methods:

  • Setting the block size smaller than the number of SMs

  • or testing the SM id and directly returning combined with assignment of work from the sm id instead of the block id.

Thank you very much! Additionally, for GPUs that do not support MIG, is MPS the only way to limit SM usage per process?

MPS is the only method I’m aware of, barring the “manual” methods which will require a change to kernel code which you explicitly excluded:

NVIDIA GPUs by default have 1 GR (graphics) engine. GPUs supporting Multi-instance GPU (MIG) have multiple GR engines. The GPU runs a single context on the GR engine at a time. Multiple GPU contexts are supported by time slicing the contexts on GR engine. Reducing the number of SMs available (e.g. Green Context) for a single GR engine will only reduce the performance of the GR engine. It will not allow multiple GPU contexts to executed concurrently on the GR engine.

MIG allow the GPU to be physically partitioned into multiple GR engines. A GPU context can run on each GR engine. The number of SMs is defined by the size of the MIG gpu_instance and compute_instance.

Multi-Process Server (MPS) allows multiple GPU contexts to be hosted in a single GPU context. The data is isolated but all GR engine resources can be shared. If any process faults on the GPU the full MPS GPU context will fault. MPS mode allows the amount of resources to be assigned when each process is launched. The total resources over all process may be oversubscribed.

1 Like

As Greg indicated, CUDA green contexts are another method to control the SM count.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.