I am testing the CPU-GPU bandwidth with nvbandwidth tool listed in “Nvidia Grace Performance Tuning guide” on Grace Hopper.
The baseline test is OK.
SUM host_to_device_memcpy_ce 371.24
SUM device_to_host_memcpy_ce 297.26
Then I did the test on MIG devices. Only one test runs on one device. The bandwidth dropped by almost 10 times no matter which profile is chosen. The results are similar even if profile 0 7g.96gb is chosen, in which case a single MIG device uses the whole GPU.
SUM host_to_device_memcpy_ce 46.18
SUM device_to_host_memcpy_ce 43.21
Is this behavior expected? I would expect the C2C bandwidth could be shared in some way among MIG devices statically or dynamically. Any advice?