What is the best way to partition the SM of a GPU?

user52911 · July 20, 2023, 3:06am

Sorry for bothering you!

I am writing an application which needs to deal with messages of different kinds on a single GPU (T4 or V100). There are two kinds of messages, the first one is rare and computational expensive. The second one is common and cheap to process. I have tried bind each type of messages to a separate cuda stream, but lead to interference.

Someone told me that I can use MPS to assign a portion of GPU resources to each type of messages, which can avoid interference. After read the official docs of MPS, I realize that it only supports multi-processes model (correct me if I am wrong), which is very different from the multi-thread model I am currently using. It is hardly possible to refactor the whole projects due to the heavy engineering effort.

When reading the MPS docs, I find a code snip that shows I can create cuda context with different SM counts in one process and launch kernels with different contexts. In my case, I can use one context for each type of messages. Since I use cuda runtime to do all my project and I am not familiar with cuda driver api, I am wondering whether it is a good way to solve my problem? Will it leads to bad performance since there are multiple contexts on one GPU? Is there a better way to restrict the resources for a kernel?

Thank you so much!

MarkusHoHo · August 17, 2023, 11:45am

Hello @user52911 and welcome to the NVIDIA developer forums.

I think this very CUDA specific topic is better suited for the experts in this area, so I take the liberty of moving it to the corresponding CUDA forums.

Thanks!

Robert_Crovella · August 17, 2023, 2:14pm

I think the MPS method is a good method to restrict GPU resources for a particular client. As you pointed out, it requires multi-process usage.

With the CUDA runtime API, and avoiding any use of the driver API, there is no practical method to make use of multiple contexts per process. The runtime expects to use the so-called “primary context”.

Within the runtime API, you can restrict a particular kernel call to only use a portion of the SMs; simply launch that kernel with that many blocks. For example if you have a GTX 1660 Super with 22 SMs, and you launch a kernel with 16 blocks, then there will be 6 SMs that are unoccupied. If you launch another kernel, it will be able to use (at least) those 6 SMs, without interference.

This is difficult to orchestrate in a complex setting where there are many kernels being launched, but if you have just 2 things to partition as you seem to be indicating, it might be something to consider. This method is probably also less than optimal compared to the MPS method because it may be difficult or impossible to get full GPU occupancy using this method. So there may be overall performance considerations. But partitioning a system with MPS generally also has performance considerations.

Finally, CUDA stream priority may be something to consider.

Topic		Replies	Views
Fine grained Kernel scheduling with MPS CUDA Programming and Performance tensorflow , kernel , ubuntu , python , linux	10	1391	January 11, 2025
Question about GPU sharing of Multi-process service CUDA Programming and Performance	9	6411	April 30, 2018
MPS: Best practise for SM partitioning System Management and Monitoring (NVML) cuda	0	808	January 11, 2022
Question about CUDA MPS CUDA Programming and Performance	15	2657	August 22, 2022
Launch multiple kernels while using milti-process service (MPS) CUDA Programming and Performance	6	1298	December 12, 2014
GPU sharing among different application with different CUDA context CUDA Programming and Performance	23	18139	December 17, 2020
cuda kernels from different process can run concurrently? same performance with MPS on and off? CUDA Programming and Performance	9	2055	May 3, 2018
Configuring multiple Volta MPS servers for execution resource provisioning CUDA Programming and Performance	2	1105	December 6, 2018
How to tune the SM utilization (across the entire GPU) of a CUDA kernel? CUDA Programming and Performance cuda , kernel , ubuntu	4	934	July 23, 2023
MPS resource management CUDA Programming and Performance	1	428	February 8, 2024

What is the best way to partition the SM of a GPU?

Related topics