How to schedule the kernel to a specified SM?

I’m currently working on deploying multiple machine learning models on a single GPU. While Nvidia’s offerings like Multi-Process Service (MPS) and Multi-Instance GPU (MIG) provide coarse-grained resource management, I want to know if there’s any fine-grained management at the kernel level e.g. schedule the kernel on some partitons of SM.

I’ve come across AMD’s CU masking feature, which allows kernels to run on specific Compute Units (CUs). However, I’m curious if Nvidia GPUs offer a direct counterpart to AMD’s CU masking. Is there a native method within Nvidia’s ecosystem to specify which CUDA cores or SMs a kernel should utilize? If not, are there any implicit techniques or workarounds that achieve similar results?

other than what you find to do this (somewhat indirectly) in MPS, CUDA provides no functionality to direct a kernel execution to specific SMs.

You could use an “implicit” technique by identifying the SM you are running on (by its ID, search these forums for mentions of smid), and then modifying your behavior. For example, if my SM ID is larger than a certain number, just exit. In this case, you may need to increase your block count or make other design changes to achieve a desired outcome. It would still be difficult to arrange for “complete, flexible specification and control” of SM utilization, this way.

Thanks for your reply, I have one concern about this method, if it find itself doesn’t on the desired SMs, it will exit and relaunch(?), will it be possible that the kernels can never run on my desired SMs ?

nothing like that happens automatically. If that is your desire, you would have to achieve that somehow with code. I’m not sure such a thing is possible.

perhaps, for example if they are in use already

I probably “oversold” the implicit technique I mentioned. It probably cannot be used to direct do SM masking, at least not easily, and not without other code.