Following https://forums.developer.nvidia.com/t/how-to-change-cpu-affinity-in-nvidia-smi-topo/190990 and https://stackoverflow.com/questions/55364149/understanding-nvidia-smi-topo-m-output (especially the awesome figure) I am trying to make sense of my output from ‘nvidia-smi topo -m’.
              GPU0      GPU1    GPU2    GPU3    mlx5_0  CPU Affinity    NUMA Affinity
GPU0     X             NV2        NV2      NV2         SYS     0-3,7-9,13-15            0
GPU1    NV2           X           NV2      NV2         SYS     0-3,7-9,13-15            0
GPU2    NV2         NV2          X         NV2      NODE    24-27,31-33               2
GPU3    NV2         NV2        NV2        X         NODE    24-27,31-33               2
mlx5_0  SYS        SYS       NODE    NODE      X 
This is the output from one of our Volta nodes.
I understand that this is 4 GPUs connected by NVLink across 2 NUMA nodes.
It is the CPU Affinity column I am trying to get to grips with.
In previous years I had a script passing this output to assign CPU “controllers” for each GPU (which I guess I can still do) but topology seemed more intuitive in those days which CPU was closest to the GPU, because it was consecutive or numerically strided. The above CPU affinity column feels unintuitive, especially as the node has 48 CPUs.
Can you explain that smi output and advise on the best choice of matching the CPU to GPU where the CPU is only acting as controller and the remaining CPUs are doing other tasks in the background?