Hi, I have a question bugging me for a while. I’ve often read that even at low levels, assigning GPU SMs deterministically is impossible. However, I’ve recently been working with NVIDIA MIG, and I’ve noticed that the system allows deterministic assignment of GPCs due to its isolation.
My question is: is there any way to assign GPCs deterministically without using MIG?
I’m not just asking about GPUs that officially support MIG — I’m also interested in systems like NVIDIA Orin, which have a multi-GPC architecture but don’t support MIG. Since the hardware architecture allows isolation, is there any way to leverage that deterministically, perhaps through CUDA, driver-level tweaks, or low-level APIs?
Please clarify what you would like to achieve? What type of isolation are you requesting?
A GPC, General Processing Cluster (or Graphics Processing Cluster in GPU docs), is a collection of SMs/L1TEX, constant caches, etc. When exercising the 3D pipeline the GPC includes instances of the world, screen pipe, and ROP units.
When a GPU is not in MIG mode the GPC provides no level of isolation. GPC isolation in MIG is at a gpu_instance level. Multiple compute_instances may share a GPCs resources.
Are you trying to isolate expecting a specific type of quality-of-serivce?
Are you trying to isolate to achieve a performance improvement through improved localization?
CUDA Green Context provides a method to direct work to subsets of SMs. If you can show specific need for GPC level it may be useful to file a bug to CUDA requesting Green Context support “GPC” level SM allocation.
In GH100+ Thread Block Clusters have GPC level scope and may be able to be used to achieve some additional level of scheduling locality.
My goal is to run neural network models concurrently on platforms like NVIDIA Orin, and I’m looking to achieve a level of isolation between them, similar to what MIG (Multi-Instance GPU) offers on data center GPUs like the A100 or H100.
I’ve been running some tests and noticed that even when using MIG, there doesn’t seem to be 100% isolation between processes. That’s why I’d like to better understand how MIG actually works under the hood — both from a hardware and CUDA scheduling perspective — to see if there’s a way I can tune or configure it to achieve stronger isolation for my use case.
To clarify, my question is less about overall performance and more about system guarantees — for example, making sure that one model’s execution doesn’t interfere in any way with another’s, especially in concurrent or real-time scenarios.
In short:
I want to run multiple models in parallel without interference.
MIG doesn’t seem to fully isolate workloads (at least in my experience).
I’m trying to understand the current limits and tools (like MIG, Green Contexts, or newer scheduling models) that might help achieve true isolation — especially on platforms like Orin or Ampere/Hopper-based GPUs.
If there’s any technical documentation or recommendations that go deeper into how resource partitioning and scheduling work in MIG, I’d really appreciate it.