Grace Hopper CPU-GPU bandwidth with MIG

I am testing the CPU-GPU bandwidth with nvbandwidth tool listed in “Nvidia Grace Performance Tuning guide” on Grace Hopper.

The baseline test is OK.
SUM host_to_device_memcpy_ce 371.24
SUM device_to_host_memcpy_ce 297.26

Then I did the test on MIG devices. Only one test runs on one device. The bandwidth dropped by almost 10 times no matter which profile is chosen. The results are similar even if profile 0 7g.96gb is chosen, in which case a single MIG device uses the whole GPU.
SUM host_to_device_memcpy_ce 46.18
SUM device_to_host_memcpy_ce 43.21

Is this behavior expected? I would expect the C2C bandwidth could be shared in some way among MIG devices statically or dynamically. Any advice?

I’m looking into it. It will probably be a few days before I can reply again.

I believe I have reproduced your observation. I have filed an internal bug at NVIDIA (4617666) to have some others take a look at it. I don’t know when I will be able to respond further.