How to use CUDA_VISIBLE_DEVICES for MIG instances

Hi, there!
I am new to Multi-Instance GPU (MIG). I want to use MIG, the new feature of A100 to optimize my application. It uses MPI, so it includes codes like cudaSetDevice(rank%8). After cutting each of the original GPUs into two MIGs, I want to make the least change of my code, so I change the code above to cudaSetDevice(rank%16) and uses CUDA_VISIBLE_DEVICES={UUID of each MIG}. However, only the first MIG is found.
How can CUDA_VISIBLE_DEVICES apply to MIG instances? If there is not, what is the alternative way to use MIG+MPI?

https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#cuda-visible-devices

Thanks for your reply. I know the instructions in the manual. I am just wondering that with CUDA_VISIBLE_DEVICES=MIG-aa,MIG-bb like env setups, the cudaSetDevice(1) will fail and the return value of cudaGetDeviceCount() equals to 1.
What can I do with this situation then?
My CUDA version is 11.4

From the previously linked doc section:

  1. CUDA can only enumerate a single compute instance

You may also wish to familiarize yourself with terminology and partitioning sections.

If you wish, you can create a multi-process application (perhaps for example using MPI) and assign one compute instance or GPU instance to each MPI rank, using a setting for CUDA_VISIBLE_DEVICES such that each MPI rank “sees” a different compute instance or GPU instance. In this way, each MPI rank will indeed see only a single CUDA enumerated device, and indeed each MPI rank will observe that

That is the idea. It may very well require changes to your application.

Much thanks, I’ve got your point!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.