I have a AMD 7702P 64-core system that is set up as 4 NUMA domains (one per chiplet).
I also have 4 GPUs installed, with 2 NVlinks connecting 2 GPUs each (in pairs).
I would like to have 1 GPU be used per NUMA domain, but when I use nvidia-smi to see the topology I see this:
nvidia-smi topo -m
GPU0 GPU1 GPU2 GPU3 CPU Affinity NUMA Affinity
GPU0 X NV2 SYS SYS 48-63,112-127 3
GPU1 NV2 X PHB SYS 16-31,80-95 1
GPU2 SYS PHB X NV2 16-31,80-95 1
GPU3 SYS SYS NV2 X 0-15,64-79 0
which seems very strange to me. It seems to be completely skipping a NUMA domain and all its cores.
Is there any way to manually set each GPU to a NUMA domain?